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Preface 



This volume contains the 70 contributed papers and abstracts of two invited lec- 
tures presented at the 12th Annual European Symposium on Algorithms (ESA 
2004), held in Bergen, Norway, September 14-17, 2004. The papers in each sec- 
tion of the proceedings are arranged alphabetically. The three distinguished in- 
vited speakers were David Eppstein, Michael Fellows and Monika Henzinger. 

As in the last two years, ESA had two tracks, with separate program com- 
mittees, which dealt respectively with: 

— the design and mathematical analysis of algorithms (the “Design and Ana- 
lysis” track); 

— real-world applications, engineering and experimental analysis of algorithms 
(the “Engineering and Applications” track). 

Previous ESAs were held at Bad Honnef, Germany (1993); Utrecht, The 
Netherlands (1994); Corfu, Greece (1995); Barcelona, Spain (1996); Graz, Aus- 
tria (1997); Venice, Italy (1998); Prague, Czech Republic (1999); Saarbriicken, 
Germany (2000); Arhus, Denmark (2001); Rome, Italy (2002); and Budapest, 
Hungary (2003). The predecessor to the Engineering and Applications track 
of ESA was the annual Workshop on Algorithm Engineering (WAE) . Previous 
WAEs were held in Venice, Italy (1997); Saarbriicken, Germany (1998); London, 
UK (1999); Saarbriicken, Germany (2000); and Arhus, Denmark (2001). 

The proceedings of the previous ESAs were published as Springer- Verlag’s 
LNCS volumes 726, 855, 979, 1284, 1461, 1643, 1879, 2161, 2461, and 2832. The 
proceedings of the WAEs from 1999 onwards were published as Springer- Verlag’s 
LNCS volumes 1668, 1982, and 2141. 

Papers were solicited in all areas of algorithmic research, including but not 
limited to: computational biology, computational finance, computational geo- 
metry, databases and information retrieval, external-memory algorithms, graph 
and network algorithms, graph drawing, machine learning, network design, on- 
line algorithms, parallel and distributed computing, pattern matching and data 
compression, quantum computing, randomized algorithms, and symbolic compu- 
tation. The algorithms could be sequential, distributed, or parallel. Submissions 
were strongly encouraged in the areas of mathematical programming and opera- 
tions research, including: approximation algorithms, branch-and-cut algorithms, 
combinatorial optimization, integer programming, network optimization, poly- 
hedral combinatorics, and semidefinite programming. 

Each extended abstract was submitted to exactly one of the two tracks, and 
a few abstracts were switched from one track to the other at the discretion of 
the program chairs during the reviewing process. The extended abstracts were 
read by at least four referees each, and evaluated on their quality, originality, 
and relevance to the symposium. The program committees of both tracks met 
at King’s Gollege London at the end of May. The Design and Analysis track se- 
lected for presentation 52 out of 158 submitted abstracts. The Engineering and 
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Applications track selected for presentation 18 out of 50 submitted abstracts. 
The program committees of the two tracks consisted of: 



Karen Aardal 
Susanne Albers (Chair) 
Timothy Chan 
Camil Demetrescu 
Rolf Fagerberg 
Paolo Ferragina 
Fedor Fomin 
Claire Kenyon 
Elias Koutsoupias 
Klaus Jansen 
Ulrich Meyer 
Michael Mitzenmacher 
Joseph (Sefh) Naor 
Micha Sharir 
Peter Widmayer 
Gerhard Woeginger 



Design and Analysis Track 

(CWI, Amsterdam) 
(University of Freiburg) 
(University of Waterloo) 
(University of Rome “La Sapienza”) 
(BRICS Arhus and University of Southern Denmark) 

(University of Pisa) 
(University of Bergen) 
(Ecole Poly technique, Palaiseau) 
(University of Athens and UCLA) 
(University of Kiel) 
(MPI Saarbriicken) 
(Harvard University) 
(Technion, Haifa) 
(Tel Aviv University) 
(ETH Ziirich) 

(University of Twente and TU Eindhoven) 



Engineering and Applications Track 



Bo Chen 
Ulrich Derigs 
Andrew V. Goldberg 
Roberto Gross! 
Giuseppe F. Italiano 
Giuseppe Liotta 
Tomasz Radzik (Chair) 
Marie-France Sagot 
Christian Scheideler 
Jop F. Sibeyn 
Michiel Smid 



(University of Warwick) 
(University of Cologne) 
(Microsoft Research, Mountain View) 
(University of Pisa) 
(University of Rome “Tor Vergata”) 
(University of Perugia) 
(King’s College London) 
(INRIA Rhone- Alpes) 
(Johns Hopkins University) 
(University of Halle) 
(Carleton University) 



ESA 2004 was held along with the 4th Workshop on Algorithms in Bioinfor- 
matics (WABI 2004), the 4th Workshop on Algorithmic Methods and Models 
for Optimization of Railways (ATMOS 2004), the 2nd Workshop on Approxi- 
mation and Online Algorithms (WAOA 2004), and an International Workshop 
on Parametrized and Exact Computation (IWPEC 2004) in the context of the 
combined conference ALGO 2004. The organizing committee of ALGO 2004 con- 
sisted of Pinar Heggernes (chair), Fedor Fomin (co-chair), Eivind Coward, Inge 
Jonassen, Fredrik Manne, and Jan Arne Telle, all from the University of Bergen. 

ESA 2004 was sponsored by EATCS (the European Association for Theo- 
retical Computer Science), the University of Bergen, the Research Council of 
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Norway, IBM, HP and SGI. The EATCS sponsorship included an award of EUR 
500 for the authors of the best student paper at ESA 2004. The winners of the 
prize were Marcin Muca and Piotr Sandowski for their paper Maximum Matching 
in Planar Graphs via Gaussian Elimination. 

Finally, we would like to thank the members of the program committees for 
their time and work devoted to the paper selection process. 
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A Survey of FPT Algorithm Design Techniques 
with an Emphasis on Recent Advances and 
Connections to Practical Computing 



Michael R. Fellows 

School of Electrical Engineering and Computer Science 
University of Newcastle, University Drive, Callaghan NSW 2308, Australia 
mf ellowsScs .newcastle . edu.au 



Abstract. The talk will survey the rich toolkit of FPT algorithm de- 
sign methodologies that has developed over the last 20 years, includ- 
ing “older” techniques such as well-quasi-ordering, bounded treewidth, 
color-coding, reduction to a problem kernel, and bounded search trees — 
which have continued to deepen and advance — as well as some recently 
developed approaches such as win/win’s, greedy localization, iterative 
compression, and crown decompositions. 



Only in the last few years has it become clear that there are two distinct theoreti- 
cal “games” being played in the effort to devise better and better FPT algorithms 
for various problems, that is, algorithms with a running time of 0(/(fc)n°), where 
c is a fixed constant, k is the parameter, and n is the total input size. The first 
of these games is the obvious one of improving the parameter cost function /(fc). 
For example, the first FPT algorithm for Vertex Cover had a running time of 
2^n. After a series of efforts, the best currently known algorithm (due to Chan- 
dran and Grandoni, in a paper to be presented at IWPEC 2004), has a parameter 
cost function of f{k) = (1.2745)^fc^. The second theoretical game being played 
in FPT algorithm design has to do with efficient kernelization. The naturality 
and universality of this game comes from the observation that a parameterized 
problem is FPT if and only if there is a polynomial time preprocessing (also called 
data reduction or kernelization) algorithm that transforms the input (x, k) into 
(x',fc'), where k' < k and \x'\ < g{k) for some kernel-bounding function g{k), 
and where of course (x, k) is a yes-instance if and only if (x', k') is a yes-instance. 
In other words, modulo polynomial-time pre-processing, all of the difficulty, and 
even the size of the input that needs to be considered, is bounded by a function of 
the parameter. The natural theoretical game from this point of view about FPT 
is to improve the kernel-bounding function g(k). For example, after strenous ef- 
forts, a kernel-bounding function of g{k) = 335fc has recently been achieved (by 
Alber et al.) for the Planar Dominating Set problem. 

Some of the new techniques to be surveyed pertain to the /(fc) game, some 
address the kernelization game, and some are relevant to both of these goals 
in improving FPT algorithms. One of the striking things about the “positive” 
toolkit of FPT algorithm design is how unsettled the area seems to be, with 
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seemingly rather basic methodological insights still emerging in recent years 
with surprising frequency. In other words, despite 20 years of research on FPT 
algorithm design, the area still seems to be in its infancy. 

Most of the work on FPT algorithm design to date has been theoretical, 
sometimes highly theoretical and seemingly completely impractical (such as in 
FPT results obtained by well-quasiordering methods). The last few years have 
seen two developments concerning the practical relevance of FPT algorithmics. 

First of all, there has been a surge of implementation efforts that have helped 
to clarify the somewhat complex relationship between the theoretical games and 
practical algorithms. In many cases, adaptations of the structural and algorith- 
mic insights of the theoretical FPT results are yielding the best available practi- 
cal algorithms for problems such as Vertex Cover, regardless of whether the 
parameter is small or not! For example, polynomial-time pre-processing rules 
uncovered by the kernelization game will always be useful, and are sometimes of 
substantial commercial interest. Similarly, clever branching strategies that are 
often a key part of advances in the f{k) game are of universal significance with 
respect to backtracking heuristics for hard problems. Some striking examples of 
this interplay of FPT theory and practical implementations will be described by 
Langston in his IWPEC 2004 invited address. 

Secondly, a broad realization has developed that many implemented practi- 
cal algorithms for hard problems, that people are quite happy with, are actually 
FPT algorithms, unanalyzed and not described as such, sometimes even with 
quite terrible /(A:)’s (if viewed theoretically) and employing relatively unsophis- 
ticated pre-processing. So it does seem that there is plenty of scope for theoretical 
research on improved FPT algorithms to have considerable impact on practical 
computing. 
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Abstract. Web search engines have emerged as one of the central ap- 
plications on the internet. In fact, search has become one of the most 
important activities that people engage in on the Internet. Even beyond 
becoming the number one source of information, a growing number of 
businesses are depending on web search engines for customer acquisition. 
In this talk I will brief review the history of web search engines: The 
first generation of web search engines used text-only retrieval techniques. 
Google revolutionized the held by deploying the PageRank technology - 
an eigenvector-based analysis of the hyperlink structure- to analyze the 
web in order to produce relevant results. Moving forward, our goal is to 
achieve a better understanding of a page with a view towards producing 
even more relevant results. 

Google is powered by a large number of PGs. Using this infrastructure 
and striving to be as efficient as possible poses challenging systems prob- 
lems but also various algorithmic challenges. I will discuss some of them 
in my talk. 
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Abstract. The ability to represent and query continuously moving ob- 
jects is important in many applications of spatio-temporal database sys- 
tems. In this paper we develop data structures for answering various 
queries on moving objects, including range and proximity queries, and 
study tradeoffs between various performance measures — query time, data 
structure size, and accuracy of results. 



1 Introduction 

With the rapid advances in geographical positioning technologies, it has become 
feasible to track continuously moving objects accurately. In many applications, 
one would like to store these moving objects so that various queries on them can 
be answered quickly. For example, a mobile phone service provider may wish to 
know how many users are currently present in a specified area. Unfortunately, 
most traditional data structure techniques assume that data do not change over 
time. Therefore the problem of efficiently representing and querying moving 
objects has been extensively researched in both the database and computational 
geometry communities. Recently, several new techniques have been developed; 
see [1,7,9,15,17,22] and references therein. 

The kinetic data structure framework (KDS) proposed by Basch et al. [9] has 
been successfully applied to a variety of geometric problems to efficiently main- 
tain data structures for moving objects. The key observation of KDS is that 
though the actual structure defined by a set of moving objects may change con- 
tinuously over time, its combinatorical description changes only at discrete time 
instances. Although this framework has proven to be very useful for maintaining 

* Research by the first and fourth authors is supported by NSF under grants CCR- 
0086013, EIA-9870724, EIA-0131905, and CCR-0204118, and by a grant from the 
U.S. -Israel Binational Science Foundation. Research by the second author is sup- 
ported by NSF under grants EIA-9972879, CCR-9984099, EIA-0112849, and INT- 
0129182. Research by the third author is supported by NSF under grants CCR- 
0093348, DMR-0121695 and CCR-0219594. 
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geometric structures, it is not clear whether it is the right framework if one sim- 
ply wants to query moving objects: on the one hand, there is no need to maintain 
the underlying structure explicitly, and on the other hand the KDS maintains 
a geometric structure on the current configuration of objects, while one may 
actually wish to answer queries on the past as well as the future configurations 
of objects. 

For example, one can use the KDS to maintain the convex hull of a set of lin- 
early moving points [9]. While the points move, O(n^) events are processed when 
the combinatorial description of the convex hull changes, each requiring 0(log^ n) 
time. Using the maintained structure, one can easily determine whether a query 
point lies in the convex hull of the current configuration of points. But suppose 
one is interested in queries of the following form: “Does a point p lie in the 
convex hull of the point set at a future time tq?” In order to answer this query, 
there is no need to maintain the convex hull explicitly. The focus of this paper is 
to develop data structures for answering this and similar types of queries and to 
study tradeoffs between various performance measures such as query time, data 
structure size, and accuracy of results. 

Problem statement. Let S = {pi,p 2 ,-'' iPn} be a set of moving points in 
or K^. For any time t, let Pi{t) be the position of Pi at time t, and let 
S{t) = {pi{t) , p 2 {t) , ■ ■ ■ ,Pn{t)} be the configuration of S at time t. We assume 
that each pi moves along a straight line at some fixed speed. We are interested 
in answering the following queries. 

— Interval query: Given an interval / = [yi,y 2 ] C and a time stamp tg, 
report S{tg) fl /, that is, all k points of S that lie inside I at time tg. 

— Rectangle query: Given an axis-aligned rectangle i? C and a time 
stamp tg, report S{tg) fl R, that is, all k points of S that lie inside R at 

time tg. 

— Approximate nearest- neighbor query: Given a query point p € a 
time stamp tg and a parameter i5 > 0, report a point p' G S such that 

d{p,p'{tg)) < (1 + 5 ) ■mmg(zsd{p,q{tg)). 

— Approximate farthest-neighbor query: Given a query point p G R^, a 
time stamp tg, and a parameter 0 < 5 < 1, report a point p' G S such that 
d{p,p'{tg)) > (1 - 5 ) • maxg(z sd{p,q{tg)). 

— Convex- hull query: Given a query point p € R^ and a time stamp tg, 
determine whether p lies inside the convex hull of S{tg). 

Previous results. Range trees and fcd-trees are two widely used data structures 
for orthogonal range queries; both of these structures have been generalized to 
the kinetic setting. Agarwal et al. [1] developed a two-dimensional kinetic range 
tree that requires 0(nlogn/loglogn) space and answers queries in 0{\ogn + k) 
time.^ Agarwal et al. [4] developed kinetic data structures for two variants of the 

^ Some of the previous work described in this section was actually done in the standard 
two-level I/O model aiming to minimize the number of I/Os. Here we interpret and 
state their results in the standard internal-memory setting. 
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standard kd-tree that can answer queries in + k) time. These kinetic 

data structures can only answer queries about the current configuration of points, 
that is, where tq is the current time. 

Kollios et al. [17] proposed a linear-size data structure based on partition 
trees [18] for answering interval queries with arbitrary time stamps tq, in time 
-I- k). This result was later extended to by Agarwal et al. [1] with 
the same space and query time bounds. 

Agarwal et al. [1] were the first to propose time-responsive data structures, 
which answer queries about the current configuration of points more quickly 
than queries about the distant past or future. Agarwal et al. [1] developed a 
time-responsive data structure for interval and rectangle queries where the cost 
of any query is a monotonically increasing function of the difference between 
the query’s time stamp tq and the current time; the worst-case query cost is 
-|- k). However, no exact bounds on the query time were known except 
for a few special cases. 

Fairly complicated time-responsive data structures with provable query time 
bounds were developed later in [2] . The query times for these structures are ex- 
pressed in terms of kinetic events', a kinetic event occurs whenever two moving 
points in S momentarily share the same x-coordinate or the same y-coordinate. 
For a query with time stamp tq, let <p(tq) denote the number kinetic events 
between tq and the current time. The data structure uses O(nlogn) space, an- 
swers interval queries in 0{ip{tq) /n -|- log n k) time, and answers rectangle 
queries in O ( \/ (f{tq) ( ^/n/ yj\ip{tq) In'] ) log n-\-k) time . In the worst case , when 
^p{tq) = 0{n^), these query times are no better than brute force. 

Kollios et al. [16] described a data structure for one-dimensional nearest- 
neighbor queries, but did not give any bound on the query time. Later, an 
efficient structure was given by Agarwal et al. [1] to answer h-approximate 
nearest-neighbor queries in /VS) time using 0{n/VS) space. Their data 

structure can also be used to answer ^-approximate farthest-neighbor queries; 
however, in both cases, the approximation parameter <5 must be fixed at prepro- 
cessing time. More practical data structures to compute exact and approximate 
fc-nearest-neighbors were proposed by Procopiuc et al. [20]. 

Basch et al. [9] described how to maintain the convex hull of a set of moving 
points in under the KDS framework. They showed that each kinetic event 
can be handled in logarithmic time, and the total number of kinetic events is 
if the trajectories are polynomials of fixed degree; the number of events 
is only O(n^) if points move linearly [5]. With a kinetic convex hull at hand, 
one can easily answer a convex-hull query in O(logn) time. However, as noted 
earlier, the kinetic convex hull can only answer queries on current configurations. 

Our results. In this paper we present data structures for answering the five 
types of queries listed above. A main theme of our results is the existence of var- 
ious tradeoffs: time-responsive data structures for which the query time depends 
on the value of tq' tradeoffs between query time and the accuracy of results; and 
tradeoffs between query time and the size of the data structure. 
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In Section 2, we describe a new time-responsive data structure for interval 
queries in and rectangle queries in It uses 0(n^+^) space and can an- 
swer queries in -I- polylogn -I- /c) time, where is the number 

of events between tq and the current time. To appreciate how this new query 
time improves over those of previously known data structures, let us examine two 
extreme cases. For near-future or recent-past queries, where <f{tq) = 0{n), previ- 
ous structures answer interval and rectangle queries in O(logn) and 0{y/nlogn) 
time respectively, while our structures answer both queries in O(polylogn) time. 
In the worst case, where (fiitq) = f7(n^), previous structures answer queries in 
0(n) time, which is no better than brute force, while our structures provide a 
query time of roughly 0{y/n), which is the best known for any structure of near- 
linear size. Using this result, we also obtain the first time-responsive structures 
for approximate nearest- and farthest-neighbor queries. 

In Section 3, we present a data structure for answering ^-approximate 
farthest-neighbor queries, which provides a tradeoff between the query time and 
the accuracy of the result. In particular, for any fixed parameter m > 1, we 
build a data structure of size 0{m) in 0{n + mJ^^) time so that a (5-approximate 
farthest-neighbor query can be answered in time, given a parameter 

S > 1/rn^/^ as a part of the query. Note that these bounds are all independent 
of the number of points in S. 

In Section 4, we present data structures for answering convex-hull queries. We 
first describe data structures that produce a continuous tradeoff between space 
and query time, by interpolating between one data structure of linear size and 
another structure with logarithmic query time. We then present a data structure 
that provides a tradeoff between efficiency and accuracy for approximate convex- 
hull queries. 

2 Time-Responsive Data Structures 

2.1 One-Dimensional Interval Queries 

Preliminaries. Let L be a set of n lines in let Z\ be a triangle, and let r be 
a parameter between 1 and n. A (1/r) -cutting of L within Z\ is a triangulation Ei 
of A such that each triangle of S intersects at most n/r lines in L. Chazelle [13] 
described an efficient hierarchical method to construct 1/r-cuttings. In this ap- 
proach, one chooses a sufficiently large constant p and sets h = [logp r] . One 
then constructs a sequence of cuttings A = Eq, Ei, ■ ■ ■ , Eh = E, where Ei is a 
(l/p*)-cutting of L. Ei is obtained from Ei_i by computing, for each triangle 
T G Ei-i, a (l/p)-cutting Ef of Lt within r, where L,- C L is the set of lines 
intersecting r. Chazelle [13] proved that |S'i| < {cipf cip(?^ fr? for each i, 
where c \ , C 2 are constants and p is the number of vertices of the arrangement 
of L inside A. Hence [S'] = -|- /n^) for any e > 0, provided that p 

is sufficiently large. This hierarchy of cuttings can be represented as a cutting 
tree T, where each node v is associated with a triangle Ay and the subset Ly C L 
of lines that intersect Ay. The final cutting E is the set of triangles associated 
with the leaves of T. E and T can be constructed in time 0{nr‘^ -|- prjn). 



P.K. Agarwal et al. 



High-level structure. Let S = {pi,--- ,Pn} be a set of n points in 
each moving linearly. We regard time t as an additional dimension; each mov- 
ing point Pi traces a linear trajectory in the tj/-plane, which we denote ^i. Let 
L = ,£n} be the set of all n trajectories, and let A{L) denote the ar- 

rangement of lines in L. Answering an interval query for interval [j/ 1 , 2 / 2 ] and 
time stamp tq is equivalent to reporting all lines in L that intersect the vertical 
line segment {tq} x [yi,y2\- We refer to this query as a stabbing query. 

Each vertex in the arrangement A{L) corresponds to a change of y-ordering 
for a pair of points in S, or in other words, a kinetic event. As in [2], we divide 
the ty-plane into O(logn) vertical slabs as follows. Without loss of generality, 
suppose t = 0 is the current time. Consider the partial arrangement of A{L) 
that lies in the halfplane t > 0; the other side of the arrangement is handled 
symmetrically. For each 1 < * < [log 2 n] , let denote the t-coordinate of the 
(2®n)th vertex to the right of the line t = 0, and set tq = 0. Using an algorithm 
of Bronnimann and Chazelle [10], each can be computed in O(nlogn) time. 
For each i, let Wi denote the half-open vertical slab [ri_i,Ti) x R. By construc- 
tion, there are at most 2*n kinetic events inside Uj<i bVj. For each slab Wi, we 
construct a separate window data structure Wi for answering stabbing queries 
within yVi. Our overall one-dimensional data structure consists of these O(logn) 
window data structures. 

Window data structure for one-sided queries. Before we describe the 
window data structure Wi for answering general stabbing queries in Wi, We 
first describe a simpler structure W“ to handle half-infinite query segments of 
the form {tq} x (— oo,y]. Equivalently, we want to report the lines of L that lie 
below the query point {tq, y). 

Set r = n/2L We construct a hierarchical (l/r)-cutting S inside Wi, as 
described at the beginning of this section. The cutting tree 7s of S' forms the 
top part of our data structure W“; see Figure 1. Recall that any node u in 7s is 
associated with a triangle Z\„ and the subset C L of lines that intersect Au. 
For each leaf v of Ts, we build in time 0{n^'^^) a partition tree [18] for the set 
of points dual to the lines in L^. For a query point q = {tq,y), all k lines of 
Ly lying below q can be reported in time 0{{n/r')^A _|_ These partition trees 
form the bottom part of W“ . Let p{u) denote the parent of a node u in 7s • At 
each node u of 7s except the root, we store an additional set J~ C of 

lines that cross ^p(u) but lie below Z\„. If u is at level j in the cutting tree, then 
< |Ap(„)| < n/pd~^. This completes the construction of W“. 

Let fi < 2®n denote the number of vertices of the arrangement A{L) within 
the slab Wi. The total space required by 7s is 




0{nr^ + pr/n), 



for any constant e > 0, provided that the value of p is chosen sufficiently large. 
Each partition tree stored at a leaf of 7s requires 0((n/r)^+'^) space, and there 
are 0(r^+®-|-yr^/n^) such leaves. Substituting the values of p and r, we conclude 
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Fig. 1. One window structure in our one-dimensional time-responsive data structure. 

that the total space required by W~ is Similar arguments imply that 

W“ can be constructed in time. 

Given a query point q = (tq,y) that lies in slab Wi, we report the lines of 
L below q as follows. We search down the tree W“ to locate the leaf node v of 
Ts whose associated triangle Ay contains q. For each ancestor u of v, we report 
all lines in J~ . At the leaf node v, we use the partition tree of v to report all 
lines in that lie below q. The correctness of the procedure is obvious. The 
total query time is 0(logp r + {njr)^^'^ + k) = 0(log n + 2®/^ -|- fc), where k is the 
number of lines reported. 

Our overall data structure consists of O(logn) of these window structures. 
The total space and preprocessing time is the additional time required 

to locate the slab containing the query point is negligible. By construction, if 
the query point q lies in slab Wi, then < (p{tg) < 2*n. 

Lemma 1. Let S be a set of linearly moving points in K. We can preprocess S 
in 0(n^+®) time into a data structure of size so that an interval query 

of the form {tq, —oo,y) can be answered in 0 {logn + {(p{tq)/n)^^'^ + k) time, 
where k is the output size and <p{tq) is the number of kinetic events between tq 
and the current time. 

A symmetric structure can clearly be used to answer interval queries of the 
form (tq-y,+oo). 

Window structure for query segments. We now describe the window data 
structure Wi for answering arbitrary interval queries. Our structure is based on 
the simple observation that a line i G L intersects a segment 7 = {tq} x [yi,y2] 
if and only if £ intersects both 7“ = {tq} x (—00, 2/2] and 7+ = {tq} x [j/i, -boo). 
Our structure Wi consists of two levels. The first level finds all lines of L that 
intersect the downward ray 7“ as a union of a few “canonical” subsets, and the 
second level reports the lines in each canonical subset that intersect the upward 
ray 7+. See Figure 1 . 

We proceed as follows. Within each window Wi, we construct a cutting tree 
Ts on the trajectories L as before. For each node u S Ts, let J~ C Lhe defined 
as above. We construct a secondary structure Wf for J~ so that all k lines of 
J~ that lie above any query point can be reported in time 0 (logn -b 2 */^ -b k). 
Finally, for each leaf v of Ts, we construct in 0(2® log n) time a partition tree on 
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the points dual to L„, so that all k lines of intersected by a vertical segment 
can be reported in time 0(2®/^ + k). 

For a query segment 7 = {tq} x [yi, 1/2], we find the slab Wi that contains 7 
and search 7 s of with the endpoint 2/2)- Let v be the leaf of 7 s whose 
triangle Z\„ contains {tq, 1/2) ■ For each ancestor u of v, all the lines in J~ intersect 
the ray 7“, so we query the secondary structure at u to report the lines in J~ 
that also intersect the ray 7+. The total query time is 0 (log^ n+ 2 ®/^ logn + fc). 
By being more careful, we can improve the query time to 0 (log^ n + 2 ®/^ + k) 
without increasing the preprocessing time. If the query segment 7 lies in slab Wi, 
then 2 ®“^n < (p{tq) < 2 ®n. Thus, the query time can be rewritten as 0 (log^ n + 
{ip{tq)/ny/'^ + k). 

Our analysis assumes that there are 6 >( 2 ®n) events between the current time 
and the right boundary of each slab Wi. Thus, as time passes, we must occasion- 
ally update our data structure to maintain the same query times. Rather than 
making our structures kinetic, as in [1], we simply rebuild the entire structure af- 
ter every 0 {n) kinetic events. This approach immediately gives us an amortized 
update time of 0 {n^) per kinetic event; the update time can be made worst-case 
using standard lazy rebuilding techniques. 

Putting everything together and choosing the value of e carefully, we obtain 
the following theorem. 

Theorem 1. Let S be a set of n linearly moving points in K. We can preprocess 
S in time 0 {n^~^^) into a data structure of size 0 (n^+®), such that an interval 
query on S can he answered in 0 (log^ n+ {ip{tq)/ny/^ + k) time, where k is the 
output size and (p{tq) is the number of kinetic events between the query’s time 
stamp tq and the current time. Moreover, we can maintain this data structure 
in 0 {n'^) time per kinetic event. 

2.2 Two-Dimensional Rectangle Queries 

We now describe how to generalize our interval-query data structure to handle 
two-dimensional rectangle queries. Let S = {pi,--- ,Pn} be a set of n points 
in the plane, each moving with a fixed velocity. We again regard time t as an 
additional dimension. Let L = {£1, ^2, • • • , £n} be the set of linear trajectories of 
points of S in tey-space. Given a time stamp tq and a rectangle R = Rx x Ry, 
where Rx = [a;i , X2] and Ry = [yi , j/2] , a rectangle query is equivalent to reporting 
all lines in L that intersect the rectangle {tq} x i? in txy-space. For any line £, let 
£“ and denote the projections of i onto the tcc-plane and ty-plane, respectively, 
and define Lx = {£{,£f, . . . ,£„} and Ly = . . .£^|. Each vertex of A{Lx) 

or A{Ly) corresponds to a kinetic event. As for our one-dimensional structure, 
we divide txy-sp&ce into O(logn) slabs along the t-axis such that the number 
of kinetic events inside the ith slab Wi is 0(2®n), and build a window data 
structure for each Wi. 

Observe that a line i intersects {tq} x i? if and only if i’” intersects {tq} x Rx 
and P intersects {tq} x Ry. Thus we build a four-level data structure consisting 
of two one-dimensional structures described above, one on top of the other. 
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Specifically, we first build a two-level one-dimensional window data structure 
Wf for Lx as described in Section 2.1, with one minor modification: all partition 
trees in the structure are replaced by two-level partition trees so that a rectangle 
query can be answered in time + k), where m is the number of lines on 

which the partition tree is built and k is the output size, as described in [1]. For 
each node u of the cutting tree in the secondary data structure of Wf, let be 
the canonical subset of lines stored at u. Set \ if £ J^}. We then build 

a one-dimensional window structure on and store this structure at u. 

A rectangle query can be answered by first finding the time interval Wi 
containing the query rectangle {tq} x R. We then find the lines of that 
intersect the segment as a, union of canonical subsets. For each node u of 
Wf in the output canonical subsets, we report all lines of that intersect the 
segment Ry using the structure Wf Following standard analysis for multilevel 
data structures [3], we conclude the following. 

Theorem 2. Let S be a set of n linearly moving points in We ean pre- 
process S in time into a data structure of size such that an 

rectangle query on S can he answered in 0(polylogn -I- {(f{tq)/ny/^ -|- k) time, 
where k is the output size and (fifty) is the number of kinetic events between the 
query’s time stamp tq and the current time. Moreover, we can maintain this data 
structure in O(n^) time per kinetic event. 

Combining this result with techniques developed in [1], we can construct 
time-responsive data structures for ^-approximate nearest- or farthest-neighbor 
queries, for any fixed 5 > 0. Specifically, we can prove the following theorem. 

Theorem 3. Let S be a set ofn linearly moving points in R^. Given a parameter 
S > 0, we can preprocess S in time 0(n^+®/\/i5) into a data structure of size 
0(n^+®/\/5), such that an approximate nearest- or farthest-neighbor query on S 
can he answered in 0( (polylog n -I- {(fifty) /n)^^'^) /\fS) time, where (fifty) is the 
number of kinetic events between the query’s time stamp ty and the current time. 
Moreover, we can maintain this structure in 0{n'^ /'/S) time per kinetic event. 

3 Approximate Farthest-Neighbor Queries 

In this section, we describe a data structure for answering approximate farthest- 
neighbor queries that achieves a tradeoff between efficiency and accuracy. Unlike 
the approximate neighbor structure described in Theorem 3, the approximation 
parameter i5 can be specified at query time, and the cost of a query depends 
only on d, not on n. The general scheme is simple, but it crucially relies on the 
notion of e -kernels introduced by Agarwal et al. [6]. 

For a unit vector u G the directional width of a point set P C R^ 

along u is defined to be Wu{P) = maxpgp(u,p) — m.inp^p{u,p). A subset Q Q P 
is called a S-kernel of P if Wu{Q) > (1 — ^) • Wu{P) for any unit vector u € 

A i5-kernel of size can be computed in 0{n-fi l/5’^~^) time [12,24]. 

Using this general result and a result by Agarwal et al. on (5-kernels for moving 
points [6, Theorem 6.2], we can prove the following lemma. 
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Lemma 2. Let S = {pi, . . . ,pn} be a set of n linearly moving points in 
For any 0 < S < 1, a subset Q Q S of size 0(1/5'^+^/^) can be computed in 
0{n + time, so that for any time t € K and any point q € we have 

{1 — S) ■ max d{p{t) , q) < max d{p{f),q) < max d{p{t) , q) . 
pes peQ P&S 

The subset Q in this lemma is referred to as a 6-kernel for ^-approximate 
farthest neighbors. 

As we mentioned in Section 1, for a set of n linearly moving points in 
one can build in 0((n log n)/v^) time a data structure of size Ofn/^fS) that can 
answer ^-approximate farthest-neighbor queries in /'/5) time, for any 

fixed 6. By combining this result with Lemma 2 (for d = 2), we can construct in 
time 0{n-\- l/<5^) a data structure of size 0{l/5'^) so that approximate farthest 
neighbors can be found in time 0(1/5®/"'“*''^) for any d > 0. 

In order to allow 5 to be specified at query time, we build data structures for 
several different values of 5. More precisely, fix a parameter m > 1. For each i 
between 0 and Igm, let Si be a di-kernel of S, where 5i = (fl'' j rn)^ ! ^ . Lemma 2 
immediately implies that all Igm kernels Si can be computed in 0(n log m -|- 
time. We can reduce the computation time to 0{n+w7 by constructing 
the kernels hierarchically: let Sq be a do-kernel S, and for each i > 0, let Si be a 
0(di)-kernel of Si-\. For each i, we build a data structure of size 0{\Si\/ \/5i) = 
0(rnjT‘') for finding d^-approximate farthest neighbors in the kernel Si. The total 
size of all O(logm) data structures is 0{m). Given a d-approximate farthest- 
neighbor query, where (l/m)^/'^ < d < 1, we first find an index i such that 
2di < d < 2di+i, and then query the data structure for the corresponding Si. 
The returned point is a 1 — (1 — d^)^ < 25i < 5 approximate farthest neighbor of 
the query point. The overall query time is 0{\Si\^^'^'^^ / \/~5i) = 0((l/d)®/^+®). 

Theorem 4. Let S be a set ofn linearly moving points in K®. For any parameter 
m > 1, a data structure of size 0{m) can be built in 0{n -I- time so that 

for any (1/m)^/^ < d < 1, a S-approximate farthest-neighbor query on S can be 
answered in 0((l/d)®/'^+^) time. 

4 Convex-Hull Queries 

In this section, we describe data structures for convex-hull queries for linearly 
moving points in the plane that provide various tradeoffs. We begin with data 
structures that provide tradeoffs between space and query time by interpolating 
between two base structures — one with linear size, the other with logarithmic 
query time. We then show how to achieve a tradeoff between efficiency and 
accuracy, following the general spirit of Section 3. 

Trading off space for query time. As in the previous section, let £i be the line 
traced by pi through xyt-space and let L = {^i, £ 2 , ■ • ■ , £«}• We orient each line 
in the positive t-direction. Given a time stamp tg and a query point p G K®, let p 
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Fig. 2. Pliicker coordinates and convex-huff queries on moving points in 



denote the point (p, tg) G and let 7 denote the plane t = tg inM.^ . The convex- 
hull query now asks whether p G conv(7 iT L). Observe that p ^ conv(7 IT L) if 
and only if some line £ C 7 passes through p and lies outside conv(7 T L). Each 
of the two orientations of such a line £ has the same relative orientation^ with 
respect to every line in L; see Figure 2. 

We use Pliicker coordinates to detect the existence of such a line. Pliicker 
coordinates map an oriented line £ in to either a point tt{£) in (referred to 
as the Pliicker point of £) or a hyperplane zu{£) in K® (referred to as the Pliicker 
hyperplane of £). Furthermore, one oriented line £\ has positive orientation with 
respect to another oriented line £2 if and only if ’k{£i) lies above vj{£ 2 ) [14]. The 
following lemma is straightforward. 

Lemma 3. For any plane 7 m M and any point p G the set of Pliicker points 
of all oriented lines that lie in 7 and pass through p forms a line in K®. 

Let H = {'uj(£i) \ £i G L} be the set of Pliicker hyperplanes in of the 
oriented lines in L. For a query point p = (tg,p), let A denote the line in 
of Pliicker points of oriented lines that lie in the plane 7 and pass through p. 
A line £ has the same relative orientation with respect to all lines in L if and 
only if tt(£) lies either above every hyperplane in H or below every hyperplane 
in H . Thus, our goal is to determine whether there is any Pliicker point on the 
line A that lies on the same side of every Pliicker hyperplane in H. This can be 
formulated as a 5-dimensional linear-programming query, which can be solved 
directly using a data structure of Matousek [19] to obtain the following result.^ 

Theorem 5. Let S be a set ofn linearly moving points in Given a parameter 
n < s < , a data structure of size 0{s^'^^) can be built in 0(s^+^) time so that 

a convex-hull query on S can be answered in O {{n/ y/s) polylog n) time. 

Logarithmic query time. If we are willing to use quadratic space, we can 
achieve O(logn) query time without using the complicated linear-programming 

^ For any four points a,b,c,d G R®, the relative orientation of the oriented line from 
a to 6 with respect to the oriented line from c to d is defined by the sign of the 

determinant of the 4-by-4 matrix 

Data structures of Chan [11] and Ramos [21] can also be used for this purpose. 
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query structure. Basch et al. [9] described how to kinetically maintain conv(S'(t)) 
as t changes continuously, in O(log^n) time per kinetic event. Let fj, denote the 
number of combinatorial changes to conv(5'(t)); since the points in S are moving 
linearly, we have /i = 0{n^). The kinetic convex hull may undergo some addi- 
tional internal events, but the number of internal events is also O(n^). Thus, 
the entire kinetic simulation takes 0{n^ log^ n) time. Applying standard persis- 
tence techniques [23], we can record the combinatorial changes in conv]^]^)) in 
0{n+y) space, so that for any time tq, the combinatorial structure of conv]^]^^)) 
can be accessed in 0(log fi) = 0(log n) time. With this combinatorial structure in 
hand, we can easily determine whether a point p lies inside or outside conv(S'(tg)) 
in O(logn) time. 

Theorem 6. Let S be a set of n linearly moving points in and let p = 0{n^) 
be the total number of combinatorial changes to the convex hull of S over time. 
A data structure of size 0{n + p) can be built in O(n^log^n) time so that a 
convex-hull query on S can be answered in O(logn) time. 

Trading efficiency for accuracy. Using ideas from the previous section, we 
can also build a data structure for approximate convex-hull queries. Recall that 
a subset Q C S' is a 5-kernel of a set S of moving points in if Wu{Q{t)) > 

(1 — (5) • Wu{S{t)) for any time t and any unit vector u € where Wu{-) is 

the directional width function defined in Section 3. If the points in S are moving 
linearly, then for any <5, one can compute a (5-kernel of S of size in 

0(n-|-l/i5^) time [6,12,24]. To establish a tradeoff between efficiency and accuracy 
for convex- hull queries, we proceed as in Section 3. 

Let m > 1 be a fixed parameter. For all i between 0 and Ig m, let Si = 
(2®/™)^/^, and let Si be a i5i-kernel of S of size 0(l/(5^^^) = 0((m/2®)^/^). We 
first compute all Sfs in 0{n -\- m) time as in Section 3, and then we build the 
logarithmic-query structure of Theorem 6 for each Si . The size of one such data 
structure is 0(|Si|^) = 0{mf2^), so altogether these structures use 0{m) space. 
To answer a (5-approximate convex-hull query, where (1/m)^/^ < (5 < 1, we first 
find an index i such that Si < 5 < (5i+i, and then query the corresponding data 
structure for Si. The query time is 0(log |S'i|) = 0(log(l/5)). 

Theorem 7. Let S be a set ofn linearly moving points in For any parameter 
m > 1, a data structure of size 0{m) can be built in time so that for 

any (1/m)^/^ < 5 < 1, a 5-approximate convex-hull query on S can be answered 
in 0(log(l/(5)) time. 
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Abstract. There is no known algorithm that solves the general case of 
approximate string matching problem with the extended edit distance, 
where the edit operations are: insertion, deletion, mismatch, and swap, 
in time o(nm), where n is the length of the text and m is the length of 
the pattern. 

In the effort to study this problem, the edit operations where analysed 
independently. It turns out that the approximate matching problem with 
only the mismatch operation can be solved in time Oiriyjm log m). If the 
only edit operation allowed is the swap, then the problem can be solved 
in time 0(n log m log a), where cr = min{m, IT'D. 

In this paper we show that the approximate string matching problem with 
the swap and mismatch as the edit operations, can be computed in time 
0{n\/m logm). 



1 Introduction 

The last few decades have prompted the evolution of pattern matching from a 
combinatorial solution of the exact string matching problem to an area concerned 
with approximate matching of various relationships motivated by computational 
molecular biology, computer vision, and complex searches in digitized and dis- 
tributed multimedia libraries [7]. To this end, two new paradigms were needed 
- “Generalized matching” and “Approximate matching”. 

In generalized matching the input is still a text and pattern but the “match- 
ing” relation is defined differently. The output is all locations in the text where 
the pattern “matches” under the new definition of match. The different applica- 
tions define the matching relation. 

Even under the appropriate matching relation there is still a distinction be- 
tween exact matching and approximate matching. In the latter case, a distance 
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function is defined on the text. A text location is considered a match if the dis- 
tance between it and the pattern, under the given distance function, is within the 
tolerated bounds. Below are some examples that motivate approximate match- 
ing. In computational biology one may be interested in finding a “close” muta- 
tion, in communications one may want to adjust for transmission noise, in texts 
it may be desirable to allow common typing errors. In multimedia one may want 
to adjust for lossy compressions, occlusions, scaling, affine transformations or 
dimension loss. 

The earliest and best known distance functions are Levenshtein’s edit dis- 
tance [11] and the Hamming distance. Let n be the text length and m the pattern 
length. Lowrance and Wagner [12,13] proposed an 0(nm) dynamic programming 
algorithm for the extended edit distance problem. In [10] the first 0{kn) algo- 
rithm was given for the edit distance with only k allowed edit operations. Cole 
and Hariharan [5] presented an 0{nk^/m -\-n-\-m) algorithm for this problem. 
To this moment, however, there is no known algorithm that solves the general 
case of the extended edit distance problem in time o{nm). 

Since the upper bound for the edit distance seems very tough to break, at- 
tempts were made to consider the edit operations separately. If only mismatches 
are counted for the distance metric, we get the Hamming distance, which de- 
fines the string matching with mismatches problem. A great amount of work was 
done on finding efficient algorithms for string matching with mismatches [1,9,3]. 
The most efficient deterministic worst-case algorithm for finding the Hamming 
distance of the pattern at every text location runs in time 0(ni/mlogm). Isolat- 
ing the swap edit operation yielded even better results [2,4], with a worst-case 
running time of 0(n log m log cr). 

This paper integrates the above two results. Integration has proven tricky 
since various algorithms often involve different techniques. For example, there 
are efficient algorithms for string matching with don’t cares (e.g. [8]) and efficient 
algorithms for indexing exact matching (e.g. [14]), both are over 30 years old. 
Yet there is no known efficient algorithm for indexing with don’t cares. We 
provide an efficient algorithm for edit distance with two operations: mismatch 
and swap. Our algorithm runs in time 0{n\/m logm). This matches the best 
known algorithm for Hamming distance matching. Indeed, a degenerate case of 
swap and mismatch edit distance matching, is the Hamming distance matching 
(for example, if all pattern symbols are different), so our algorithm is the best 
one can hope for considering the state-ot-the-art. The techniques used by the 
algorithm are novel cases of overlap matching and convolutions. 



2 Problem Definition 

Definition 1. Let S = Si ... s^ he a string over alphabet S. An edit operation 
E on S is a function E : A" — >■ . Let S = Si . . . and T = ti . . .tn he 

strings over alphabet E. Let OP be a set of local operations. OP is called the 
set of edit operations. The edit distance of S and T is the minimum number 
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k such that there exist a sequence of k edit operations {E\, ...,Ek) for which 
EkiEk-ii---Ei{T)---) = S. 

Example: The Lowrance and Wagner edit operations are: {INSi^a, DELi, 
REPi^cr, and SWAPi}, where 

I N Si^(j(^S± . . . Syi) — Si . . . Sij fJ, Sj-i-i . . . Snt 
DELi(^S\ . . . Sji) Si . . . Sj_i, Sj_|_i . . . s^, 

REPi^cr(si . . . Sn) = Si . . . Si_i,(TSi+i . . . s„, and 
iSVE APi^Si . . . Sti) — Si . . . S^—i, Sj^i, S 2 , SZ “t“ 2 . . . Sn- 



Definition 2. Let T = ti ... tn he a text string, and P = pi . . . pm be a pattern 
string over alphabet E . The edit distance problem of P in T is that of computing, 
for each i = 1, n, the minimum edit distance of P and a prefix of ti . . .tn. 

Lowrance and Wagner [12,13] give an 0{nm) algorithm for computing the 
edit distance problem with the above four edit operations. To date, no better 
algorithm is known for the general case. We consider the following problem. 

Definition 3. The swap mismatch edit distance problem is the following. 
INPUT: Text string T = ti . . .tn and pattern string P = pi . . .p^ over alphabet 

E. 

OUTPUT: For each i = 1, ...,n, compute the minimum edit distance of P and a 
prefix of ti . . .tn, where the edit operations are {REPi^„., and SW APi\. 

The following observation plays a role in our algorithm. 

Observation 1. Every swap operation can he viewed as two replacement oper- 
ations. 

2.1 Convolutions 

Convolutions are used for filtering in signal processing and other applications. A 
convolution uses two initial functions, t and p, to produce a third function t®p. 
We formally define a discrete convolution. 

Definition 4. Let T be a function whose domain is {0, ...,n— 1} and P a func- 
tion whose domain is {0, ...,m—l}. We may viewT and P as arrays of numbers, 
whose lengths are n and m, respectively. The discrete convolution of T and P is 
the polynomial multiplication T ® P , where: 

m— 1 

{T®P)[j\ = ^ T[j + i]P[i], j = 0,...,n - m + 1. 

i=0 

In the general case, the convolution can be computed by using the Fast 
Fourier Transform (FFT) [6] on T and P^, the reverse of P. This can be done 
in time O(nlogm), in a computational model with word size O(logm). 
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Important Property: The crucial property contributing to the usefulness 
of convolutions is the following. For every fixed location jo in T, we are, in 
essence, overlaying P on T, starting at jo, i-e. P[0] corresponds to T[jo], P[l] to 
r[jo + 1], to T[jo + z], — 1] to T[jo + m — 1]. We multiply each 

element of P by its corresponding element of T and add all m resulting products. 
This is the convolution’s value at location jg. 

Clearly, computing the convolution’s value for every text location j, can 
be done in time 0{nm). The fortunate property of convolutions over alge- 
braically close fields is that they can be computed for all n text locations in 
time O(nlogm) using the FFT. 

In the next few sections we will be using this property of convolutions to 
efficiently compute relations of patterns in texts. This will be done via linear 
reductions to convolutions. In the definition below M represents the natural 
numbers and TZ represents the real numbers. 



Definition 5. Let P be a pattern of length m and T a text of length n over 
some alphabet S . Let R{S\, S 2 ) be a relation on strings of length m over E . We 
say that the relation R holds between P and location j of T z/i?(P[0] • • • P[m — 
l],T[j]Tb- + l]---T[j + m-l]). 

We say that R is linearly reduced to convolutions if there exist a natural 
number c, a constant time computable function f : > {0, 1}, and linear time 

functions and r",...,r", 'in.m G M, where Pf" : E™ — >• P™, rf : 

P" — >■ P”, z = 1, ..., c such that R holds between P and location j in T iff 

/(£7*(P)®r5^(P)[j],£^(P)®rJ(P)[j],...,C(J")®r-"(T)[j]) = 1. 



Let P be a relation that is linearly reduced to convolutions. It follows imme- 
diately from the definition that, using the FFT to compute the c convolutions, it 
is possible to find all locations j in T where relation R holds in time O(nlogm). 

Example: Let E = {a, 6} and R the equality relation. The locations where R 
holds between P and T are the locations j where T[j-|-z] = P[i], i = 0, ..., m — 1. 
Fischer and Patterson [8] showed that it can be computed in time O(nlogm) by 
the following trivial reduction to convolutions. 

Let £1 = Xa, £2 = Xb, ri = Xa, T 2 = Xb where 



Xa{x) 



1 , 

0 , 



if a: = (t; 
otherwise. 



and Xw{x) 



1, \i X cr; 

0, otherwise. 



and where we extend the definition of the functions Xo- to a strings in the 
usual manner, i.e. for S = S 1 S 2 . . . Sn, X<j[S) = Xcr(si)x<7(s2) ■ ■ ■ X<t(s„). 

Let 



f[x,y) = I 



if a; = y = 0; 
otherwise. 



Then for every text location j, f{£i{P) ® ri(P)[j],^ 2 (^’) ® r 2 {T)[j]) = 0 iff 
there is an exact matching of P at location j of T. 
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3 Algorithm for Binary Alphabets 

For simplicity’s sake we solve the problem for binary alphabets S = {0, 1}. We 
later show how to handle larger alphabets. 

When considering a binary alphabet, a swap operation can be effective only 
in the cases where the text has a pair 10 aligned with an 01 in the pattern, 
or vice versa. Therefore, we are interested in analysing the case of alternating 
sequences of zeros and ones. We define this concept formally, since it is key to 
the algorithm’s idea. 

An alternating segment of a string S € {0,1}* is a substring alternating 
between Os and Is. A maximal alternating segment, or segment for short, is an 
alternating segment such that the character to the left of the leftmost character 
X in the alternating segment, if any, is identical to x, and similarly, the character 
to the right of the rightmost character y, if any, is identical to y. 

Any string over 27 = (0, 1} can be represented as a concatenation of segments. 
We need to identify the cases where aligned text and pattern segments match 
via swap operations only and the cases where replacements are also necessary. 

We now show the key property necessary to reduce swap matching to overlap 
matching. To this end we partition the text and pattern into segments. 
Example: 

Let P= 101000111010100110100111010101 

P’s segments are: 1010 0 01 1 101010 01 1010 01 1 101010. 

The following lemma was proven at [2]. 

Lemma 1. The pattern does not (swap-) match in particular alignment if and 
only if there exists a segment A in the text and a segment B in the pattern such 
that: 1. The characters of A and B misalign in the overlap. 2. The overlap is of 
odd-length. 

The conditions of the above lemma are also useful for our problem. 

Lemma 2. The number of mismatches that are not part of a swap, is exactly 
the number of the overlaps that implement condition 1. and 2. of lemma 1. 

Proof: We will examine all possibilities: 

1. Condition 1. of the lemma does not hold. Then there is no misalignment of 
the text. Indeed it matches the pattern. 

2. Condition 1. holds but condition 2. does not. According to lemma 1 there is 
a swap-match. 

3. If the two condition hold then either one of the two segments A and B is 
entirely contained in the other or the overlap is a real substring of both 
segments. For the first case we may assume, without loss of generality, that 
segment B of the pattern is contained in segment A of the text (the other 
case is treated in a similar fashion) . The situation is that there is a misalign- 
ment and the overlap length is odd. Schematically, we have (with B and A 
boldfaced): 
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Pattern: • • -aabab • • • abaa • •• 

Text: • • ababa • • • baba • •• 

Since swapping B's edges will not help, the only swaps possible are internal 
to B. This means that there is exactly one element that remains mismatched 
after the swap. 

The other situation is when the overlap is a real substring of both segments. 
We assume that B starts before A (the other case is handled in a similar 
fashion). The situation is: 

Pattern: abab • • • abaa • •• 

Text: • 6ba • • • baba • •• 

Again it is clear that the only possible swaps are internal to B, leaving one 
element mismatched even after all possible swaps. □ 



Corollary 1. The outline of our algorithm is as follows: 

1. For every text location i, count the number of mismatches mi of the pattern 
starting at location i [1]. 

2. Partition the text and pattern into segments. 

3. For every text location i, count the number I'i of odd-length misaligned overlaps. 
This is exactly the number of "real" (non-swap) mismatches. 

4. The number of swap-errors at location i is Si = {mi — ri)l2, 

5. The number of swap and mismatch edit errors at location i is ri + Si. 

3.1 Grouping Text Segments by Parity of Starting and Ending 
Location 

We follow some of the implementation ideas of [2]. However, there it was only 
necessary to check existence of odd-length mismatched overlaps, and we need 
to count them as well. The main idea we use is to separate the segments of the 
text and pattern into a small number of groups. In each of these groups it will 
be possible to count the required overlaps in time 0(ny/mlogm) using a limited 
divide-and-conquer scheme based on polynomial multiplications (convolutions). 
In the subsections that follow we handle the different cases. Some of these cases 
necessitate new and creative uses of convolutions. 

For checking the parity of the overlap, we need to know whether a text 
segment ends at an odd or even text location. Consequently, we define new texts 
where each text has exactly those segments of a given start and end parity, 
with all other text elements defined as 4> {don’t care) which consequently never 
contribute an error. (In the polynomial multiplication there will always be a 
0 in these text locations.) Henceforth, for short, we will talk of multiplying a 
text and pattern, by which we mean a polynomial multiplication, where each of 
the text and pattern is viewed as a vector of coefficients for the corresponding 
polynomials. 
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Definition: T°° is a string of length n where for every location i, if U is in a 
segment whose first element is in an odd location and whose last element is in an 
odd location, T°°[i] = T[i], In all other locations j, T°°[j] = (j>. T®® is defined in 
a similar fashion. We similarly define and but for technical reasons we 
need to further split these cases into and T|°. Tf® contains all the 

odds oe-segments (the first oe-segment, the third oe-segment, etc.). contains 
all the even oe-segments (the second oe-segment, the forth oe-segment, etc.). We 
similarly define and T|°. 

Example: T = 101000111010100110100111010101. 

T segments are: 1010 0 01 1 101010 01 1010 01 1 1010101 

r2® = cj)(j)(j)(j)cj)(j)(j)(piiiiii(j)cj)iiii(j)(j)(j)(j)4i(l)(j)4>4'4’4''P4'4’ 

^2° “ (j)(f)(p<p(f>(l)(p<p(j)(f)(p<p(f>(l)(p<p(j)(j)<f>(j)(j)(l)(l)(j)(j)(j)<p(j)(j)(l)(l)(j)(j)(j) 



Note that the segments of T°° are exactly the segments of T that start and 
end at an odd location. 

3.2 Grouping the Pattern Segments 

The pattern segments are defined in exactly the same way as the text segments 
and we would like to group them in the same way. However, there is a difficulty 
here in “nailing down” the parity of a location, since the pattern is shifted and 
compared to every text location, i.e. the grouping of pattern segments needs to 
be related to the text parity in order to measure the parity of overlap. However, 
since the only property we have used in our grouping of text segments was the 
parity of the text location, it is clear that in all the pattern alignments the 
pattern segments that start in odd text locations are either all at odd pattern 
locations or all at even pattern locations. Similarly, these parities will be the 
same for all pattern alignments that start in even text locations. 

We are now down to the combinations and This gives us 16 

cases. Many cases are similar, though, so we need to handle separately only the 
following three types of cases: 

1. and pP®’Pf where either ti = pi or tj = pj. (This type covers 12 cases.) 
These situations are handled in Section 4. 

2. T*®’*-’ and pP®’Pf where ti,tj = oe and pi,pj = eo; or where ti,tj = eo and 
pi,pj = oe. These cases are handled in Section 5. 

3. T*®’*-’ and pP^’Pi where ti,tj = oo and pi,pj = ee; or where ti,tj = ee and 
pi,pj = oo. These cases are handled in Section 6. 

4 Segments with Equal Parity Start or End 

Consider the case T*®’*^ and pP®’Pf where ti = pi. 
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Observation 2. For every two segments, St in starting at location x, 

and Sp in PP^’P^ , starting at location y, \x — y\ is always even. 

We are interested in the number of odd overlaps. We now show a convolution 
for which the resulting value at location i is n exactly if there is n odd-length 
overlaps with the pattern starting at location i. (The convolution with (or 
Xoe^peo^poe^ we need to do two convolutions, the first with Tf°, and the second 
with T|° ) 

The Convolution: Pattern P' = Pi • • -p'm is constructed as follows: 

* \ 1, otherwise. 

Text T' = tp ■ ■ t'^ is constructed by replacing every </> in by 0, and every 

segment in by a segment of alternating Is and —Is, starting with 1. Then 

P' and T' are convolved. 

Lemma 3. Let (T' 0 P')[q] he the qth element in the result of the convolution. 
{T' 0 P')[q] is equal to the number of odd overlaps of the relevant text segments 
and relevant pattern segments. 

Proof. This follows from the definitions of convolutions, of T', and of P' and 
from the observation that for all cases where the starting location of a pattern 
segment is smaller than the starting location of a text segment and the pair 
overlaps the contribution to the result of the convolution will be 1 if the length 
of the overlap is odd and 0 if it is even (since every text segment starts with 
a 1 and then alternates between —1 and 1). Because of Observation 2, even 
when the text segment starts at a smaller location than the pattern segment, 
the difference between the starting locations has even length. Therefore in the 
area of the overlap, the text starts with a 1 and alternates between —1 and 1. 
Thus the convolution gives us the desired result. □ 

Locations where (T' 0 P')[q] = 0 are locations without odd-overlap between 
relevant text and pattern segments. 

This solves all eight cases of and pP'^’Pi where ti = pi. For the additional 
four cases where tj = pj we simply reverse the text and pattern to obtain the 
case considered above. 

5 The Odd-Even Even-Odd Segments 

Consider the case Pff ^.2 (the case of Pff ^2 symmetric). 

Terminology: Let St be a text segment whose starting location is St and 
whose ending location is ft. Let Sp be a pattern segment being compared to 
the text at starting position Sp and ending position fp. If St < Sp < fp < ft 
then we say that St contains Sp. If Sp < St < ft < fp then we say that Sp 
contains St. If St < Sp < ft < fp then we say that St has a left overlap with Sp. 
If Sp < St < fp < ft then we say that St has a right overlap with Sp. We will 
sometimes refer to a left or right overlap as a side overlap. 
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Observation 3. For every two segments, St in Sp in Pf °^2 either 

Sp is contained in St or St is contained in Sp then the overlap is of even length. 
If the overlap is a left overlap or right overlap then it is of odd length. All possible 
cases are shown in figure 1 below. 



e c~> 
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Fig. 1. The cases where the text segment starts at an odd location and ends at an even 
location, and the pattern segment does the opposite. 



The correctness of the observation is immediate. Segments of these types 
have even length. Thus, if one contains the other the overlap is necessarily of 
even length. Conversely, in case of a left or right overlap, which we call side 
overlaps, the overlap starting and ending locations have the same parity, making 
the length of the overlap odd. Remember our desire is to count all locations where 
there are segments which have odd overlap. As before, we will use convolutions. 
The desired property for such a convolution is as follows. 



5.1 The Convolutions for the Odd-Even Even-Odd Segments Case 

The Convolution: Pattern T' = t[ - ■ ■ t'^ is constructed as follows: 

if [*] = (/<;_ 

* \ 1, otherwise. 



Text P' = p'l’ ■ -p'^ is constructed by replacing every (f in by 0, and first 

and last place in every segment in P by 1, all the other places by 0. 

For every two segments, St in Sp in Pffr 2 if is contained in St it 

will add 2 to the convolution (even length overlap), if St is contained in Sp it will 
add 0 to the convolution. If the overlap is a left overlap or right overlap then it 
will add 1 to the convolution (each odd length overlap add 1 to the convolution) . 

Our convolution will add 1 for each odd-length overlap, and will add 2 for 
each segment of P that contains a segment of T {sp < St < ft < fp)- 

All we are left to do is count the number of cases of Sp contains St , in every 
text location i. 

For each segment in the text and in the pattern we will add one place in the 
right (all the segments are larger in one), now all the segments in T and P (that 
were of even legth)are odd length segments. All the segment in T are starting 
and ending in an odd index, all the segment in P are starting and ending in an 
even index. 
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Observation 4. For every two segments, St in Tf Sp in Pf °^2 */ 
contained in St it is still contained in St- If St was contained in Sp then it is 
still contained in Sp. If the overlap was a left overlap (or right) overlap then it 
is still a left overlap (or right) overlap. In the first cases (contained or contain, 
the even overlaps) it becomes an odd overlap. In the second cases (right or left 
overlap, the odd overlaps) it becomes an even overlap. 

The correctness of the observation is immediate. We have four cases, the first 
case, Sp was contained in St, St is larger by one, therefore, Sp is still contained 
in St. The second case St was contained in Sp, St in the previous text ends in an 
even index, and Sp, in an odd index. Now both end in an odd index, therefore it 
is clear that if St was contained in Sp, St is still contained in Sp. The third case, 
a right (left) overlap, St is larger by one from the right, therefore the overlap 
stay a right (left) overlap. 

The new text and pattern are T°° and all we are left to do is count the 
number of odd length overlaps in the new text and pattern. This is done in the 
next section. 



6 The Odd-Odd Even-Even Segments 

Consider the case T°° and P®® (the case T®® and P°° is symmetric). 

Observation 5. For every two segments, St in T°° and Sp in P®® if either Sp 
is contained in St or St is contained in Sp then the overlap is of odd length. If 
the overlap is a left overlap or right overlap then it is of even length. All possible 
cases are shown in figure 2 below. 




ocid 



Fig. 2. Every containment has odd length; every side overlap has even length. 



The correctness of the observation is immediate. Segments of these types 
have odd lengths, thus if one contains the other the overlap is necessarily of odd 
length. Conversely, in case of a left or right overlap then the overlap starting and 
ending locations have opposite parity, making the length of the overlap even. 
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6.1 The Convolutions for the Odd-Odd Even-Even Segments Case 
The Convolution: Pattern P' = p^ - ■ ■ is constructed as follows: 

* \ 1, otherwise. 

Text T' = t[ - ■ ■ is constructed by replacing every </> in T**’*-^ by 0, and every 
segment in by a segment of alternating Is and —Is, starting with 1. Then 

P' and T' are convolved. 

For every two segments, St in T°° and Sp in P®® if Sp is contained in St it 
will add —1 to the convolution (odd length overlap), if St is contained in Sp it 
will add 1 to the convolution. If the overlap is a left overlap or right overlap 
then it will add 0 to the convolution (each even length overlap add 0 to the 
convolution) . 

Our convolution will add 1 or —1 for each odd- length overlap, and will add 
0 for each segment St contains Sp or Sp contains St- Therefore, we will do two 
convolutions for each length I of segments in P , the first convolution will be 
with all the segments in T that their length is shorter than /, and the second 
convolution with the all the other segments in T. 

The first convolution will give us —1 for each odd length overlap (with the 
segments in T that are shorter than 1). The second convolution will give us 1 for 
each odd length overlap (with all the other segments in T). 

Now subtract the first convolution from the second convolution. 



7 Solution for General Alphabet 

Let H be a general alphabet and let a = min(n, IT’D. 

Let Ki be the number of segments of length i in P. For each segment variety of 
length i, that appears in P, create a new pattern P' . The created pattern P' will 
contain all the original segments of length i in P, and all other symbols in the pat- 
tern, will be replaced by </>. The number of lengths is y/m (l-|-2-|-...-|--ym = m). 

Solution for a specific length: 

We divide the Kt segments of length i into two groups according to frequency. 
Every segment that contains more than y/Kij log m appearances in pattern, is 
considered frequent. 

Solution for non frequent segments of length i 

Compare each (non frequent) segment with the all the relevant text segments 
(a relevant segment is segment that contains the same symbols as the pattern 
segment). This comparison is straightforward, since we know where each relevant 
text segment starts and ends. The time complexity is therefore 0(1), for each 
relevant segment. 

Total Time for a Nonfrequent Segment of length i: a Ry/Kjlogm < 
ny/Ki/ logm, where R is the number of relevant text segments. 

Solution for frequent segments of length i: We reduce the problem into 
a 27 = {0, 1} problem. For each pair of symbols xq,xi, that are adjacent in the 
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pattern, we create a new pattern, replacing each appearance of xq with 0 and 
each appearance of xi with 1. All the other symbols are replaced by 4>. 

The time complexity of a specific length i, in S = {0,1} is nlogm. Since 
the number of appearances of that specific segment kind is at least \/Kij log m, 
time complexity is nlogmy^ Ki/logm = n^Ki log m. 

Total Time: 



y/rn 

Time < n \J K, log m 

2=2 



y/rn 

n^J\ogm E < 
2 = 2 




i\/log7 



\/m 



y/rn 



E - < n\/ logm\/m\/\ogm = n^/mlogm. 

1 



2 = 2 
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Abstract. We introduce a problem directly inspired by its application 
to DWDM {dense wavelength division multiplexing) network design. We 
are given a set of demands to be carried over a network. Our goal is to 
choose a route for each demand and to decompose the network into a 
collection of edge-disjoint simple paths. These paths are called optical 
line systems. The cost of routing one unit of demand is the number of 
line systems with which the demand route overlaps; our design objec- 
tive is to minimize the total cost over all demands. This cost metric is 
motivated by the need to avoid O-E-0 {optical- electrieal- optical) conver- 
sions in optical transmission as well as to minimize the expense of the 
equipment necessary to carry the traffic. 

For given line systems, it is easy to find the optimal demand routes. 
On the other hand, for given demand routes designing the optimal line 
systems can be NP-hard. We first present a 2-approximation for general 
network topologies. As optical networks often have low node degrees, we 
also offer an algorithm that finds the optimal solution for the special case 
in which the node degree is at most 3. 

If neither demand routes nor line systems are fixed, the situation becomes 
much harder. Even for a restricted scenario on a 3-regular Hamiltonian 
network, no efficient algorithm can guarantee a constant approximation 
better than 2. For general topologies, we offer a simple algorithm with 
an 0(log A)- and an 0(log n)-approximation where K is the number of 
demands and n is the number of nodes. For rings, a common special 
topology, we offer a 3/2-approximation. 



1 Introduction 

The constantly changing technology continues to present algorithmic challenges 
in optical network design. For example, the SONET {Synchronous Optical NET- 
work) technology has motivated a rich body of combinatorial work on rings, 
e.g. [4,10,19,20]; and the WDM {wavelength division multiplexing) technology has 
inspired numerous algorithms in coloring and wavelength assignment, e.g. [11, 
12,15,17]. In this paper we introduce a novel problem directly motivated by its 
applications to DWDM {dense WDM) network design. At the essence of this 
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problem, however, lies a fundamental graph theory question. Our problem in- 
volves partitioning a graph into edge-disjoint paths with certain properties while 
minimizing a natural, yet so far largely unstudied, cost function. 

The state-of-the-art DWDM technology allows more than one hundred wave- 
lengths to be multiplexed on a single fiber strand and allows signals to travel 
thousands of kilometers optically. One design methodology in building an optical 
network using DWDM technology is to partition the network into a collection 
of paths which are called optical line systems. Such path decompositions have 
many advantages. For example, the linear structure simplifies wavelength assign- 
ment [21], engineering constraints [13], and requires less sophisticated hardware 
components. Therefore line-based optical networks are quite popular in today’s 
optical industry. 

In a line-based network an optical signal is transmitted within the optical 
domain as long as it follows a line system. However, when it switches from one 
line system to another, an O-E-0 (optieal-electrieal-optieal) conversion takes 
place. Such conversions are slow, expensive, and defeat the advantages of optical 
transmission. Therefore, our first design objective is to avoid O-E-0 conversions 
as much as possible. 

Naturally, our second objective is to design networks of low cost. As is usual 
in the case of optical network design, we cannot control the physical layout of 
the network since optical fibers are already installed underground. We are given 
a pre-existing network of dark fiber, i.e. optical fiber without any hardware for 
sending, receiving, or converting signals. The hardware cost consists of two parts, 
the base cost and the transport cost. The base cost refers to the cost of the equip- 
ment necessary to form the line systems. An optical line system begins with an 
end terminal, followed by an alternating sequence of fibers and OADM (opti- 
eal add/ drop multiplexers) and terminates with another end terminal. Roughly 
speaking, an OADM is built upon two back-to-back end terminals. Instead of 
terminating an optical signal an OADM allows the signal to pass through within 
the optical domain. An OADM costs about twice as much as an end terminal [7]. 
As a result, independent of how the network is partitioned into line systems, the 
base cost stays more or less the same. We can therefore ignore the base cost 
in our minimization. The transport cost refers to the cost of transmitting sig- 
nals, in particular the cost incurred by the equipment needed to perform O-E-0 
conversions. Once the signal is formatted in the optical form it is transmitted 
along a line system for free. The number of O-E-0 converters for a signal is lin- 
early proportional to the number of line systems that the signal travels through. 
Therefore, minimizing O-E-0 conversions serves the dual purpose of network 
cost minimization and providing (nearly) all-optical transmission. 

Our Model. Let us describe the model in more detail. We are given a dark fiber 
network and a set of demands to be carried over the network. This network is 
modeled by an undirected graph G = (V,E). Demands are undirected as well. 
Each demand is described by a source node and a destination node and carries 
one wavelength (or one unit) of traffic. We can have duplicate demands with the 
same source-destination pair. Let K be the sum of all demands. There are two 
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design components, i) partitioning the edge set E into a set of edge-disjoint line 
systems, and ii) routing each demand. 

A routing path of a demand consists of one or more transparent sections, 
where each transparent section is the intersection of the routing path and a line 
system. It is worth pointing out here that if multiple units of demand switch 
from one line system to another, each demand unit requires its individual O-E-0 
converter. Therefore, our objective is minimizing the total number of transparent 
sections summed over all demands, which is equivalent to minimizing the network 
equipment cost and minimizing O-E-0 conversions. 

Figure 1 gives a dark fiber network and two of the many possible line system 
designs. Suppose one unit of demand travels from A to E and three units of 
demand travel from C to E. The only possible routing paths are ABODE and 
ODE respectively. The solution on the left results in 1 x 4-1-3 x 2 = 10 transparent 
sections and the solution in the middle results inlxl-|-3x2 = 7 transparent 
sections only. 




Fig. 1. (Left) A dark fiber network. Each link is a separate line system. (Middle) 
ABCDF is a line system and DE is a line system. The arrows stand for end terminals 
and solid boxes stand for OADMs. (Right) An improper line system ABCFEDCG. 



One complication in routing can arise under the following circumstances. 
When an optical signal travels between two nodes on a line system it stays 
within the optical domain only if it follows the line system path defined by the 
orientation of the OADMs. Otherwise, O-E-0 conversions may take place. For 
example, the line system path in Figure 1 (Right) is ABCEEDCG. To travel 
between B and G the direct path BOG is expensive since it does not follow 
the line system and incurs an O-E-0 conversion at node O; the “free” path 
BOEEDOG is non-simple since it visits O twice. Naturally, network carriers 
avoid expensive demand routes. On the other hand, they also need simple de- 
mand routes for reasons such as easy network management. We therefore restrict 
ourselves to line systems that induce simple demand routes only and call such 
line systems proper. More precisely, a proper line system corresponds to a path 
in which each node appears at most once, excluding its possible appearance(s) 
as the end points. 

We also emphasize that wavelength assignment is not an issue in this con- 
text. A feasible wavelength assignment requires a demand to be on the same 
wavelength when it stays on one line system. However, it is free to switch to 
a different wavelength whenever it hops onto a different line system. Naturally, 
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for wavelength division multiplexing, two demands sharing a link need to have 
distinct wavelengths. Given such constraints, as long as the number of demands 
per link respects the fiber capacity, i.e. the number of wavelengths per fiber, a 
feasible wavelength can be found efficiently using the simple method for interval 
graph coloring. (See e.g. [6].) In this paper we assume infinite fiber capacity, 
since the capacity of a single fiber strand is as large as one terabit per second. 
From now on, we focus on our objective of cost minimization without having to 
consider wavelengths. 

Our Results. In this paper we begin in Section 2 with optimizing line system 
design assuming simple demand routes are given. (We note that for given line sys- 
tems finding the optimal routing paths is trivial.) For an arbitrary network with 
node degree higher than a certain constant c, finding the optimal line systems 
that minimize the total number of transparent sections is NP-hard. Fortunately, 
optical networks typically are sparse. Perhaps as our most interesting result, 
when the network has maximal node degree at most 3, we present a polynomial 
time algorithm that finds an optimal solution. For an arbitrary network topol- 
ogy, we also present a 2-approximation algorithm. This approximation is the 
best possible with the particular lower bound that we use. 

In Section 3, we then focus on the general case in which neither routes nor 
line systems are given. We first show that this general case is much harder. In 
fact, even for a very restricted case on a 3-regular Hamiltonian network, no algo- 
rithm can guarantee a constant approximation better than 2. We then present a 
simple algorithm that guarantees an O(logn)- and 0(log iF)-approximation for 
arbitrary networks, where n is the number of nodes in the network and K is the 
number of demands. 

In Section 3.2 we focus on the ring topology. Rings are of particular inter- 
est since often the underlying infrastructure is a ring for metro-area networks. 
We give a 3/2-approximation. If the total demand terminating at each node is 
bounded by a constant, we present an optimal solution. 

In the interest of space, we omit most of the proofs. The proofs, as well as a 
discussion on the relation of our problem to supereulerian graphs, can be found 
in the full version of this paper at [1]. 

Related Work. Until now, the design of line-based networks has only been studied 
empirically, e.g. [5,7]. However, the problem of ATM virtual path layout (VPL) 
is relevant. VPL aims to find a set of virtual paths in the network such that 
any demand route can be expressed as a concatenation of virtual paths, see 
e.g. [22,3]. The model for VPL differs from ours in several aspects. For example, 
our line systems form a partition of the network links whereas virtual paths are 
allowed to overlap. Also, our demand routes can enter and depart from any point 
along a line system whereas demand routes for VPL have to be concatenations 
of entire virtual paths. This makes a huge difference in the model, since the 
extra constraint that all routes must be concatenations of whole line systems 
can increase the optimal solution by a factor of K. Finally, we are concerned 
with minimizing the total number of transparent sections, while VPL papers are 
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traditionally concerned with minimizing the maximum hop count, with notable 
exceptions such as [9]. 

2 Designing Line Systems with Specified Routing 

First we would like to point out that if line systems are given, it is easy to find the 
optimal route between any two nodes. The optimal route between any two nodes 
is the one that goes through as few line systems as possible and thus results in 
as few transparent sections as possible. We can find this route by calculating the 
shortest path according to an appropriate distance function. 

If instead we are given the demand routes, and our goal is to find the optimal 
set of proper line systems, the problem becomes NP-hard. In fact, as we show 
in [I] , even for networks with bounded degree the problem remains NP-hard. 
Therefore, we consider approximation algorithms. 

2.1 2- Approximation for Proper Line Systems 

We describe a configuration at each node, i.e. how the links incident to the node 
are connected to one another by line systems. For node u let E{u) be the set of 
links incident to u. Each link e G E{u) is matched to at most one other link in 
E(u). If e is matched to f € E(u) it means e and / are on the same line system 
connected by an OADM at u. If e is not matched to any link in E{u) it means 
e terminates a line system at u. It is easy to see that the node configurations 
imply a set of paths that partition the network links. Of course, these resulting 
paths may not correspond to proper line systems. 

Suppose every demand route is given. We offer a natural algorithm to con- 
figure each node. For every pair (u, v) and (w, w) of adjacent links, we define the 
through traffic T{uvw) as the number of demand units routed along u, v and w. 
For each node v, we match the links incident to v such that the total through 
traffic along the matched link pairs is maximized. We refer to this algorithm 
as Max Thru. Although Max Thru may define closed loops or improper line 
systems, it has the following property. 

Lemma 1. For given demand routes, the total number of transparent sections 
generated by Max Thru is a lower bound on the cost of the optimal feasible 
solution. 

To find proper line systems given simple demand routes, we begin with the 
Max Thru algorithm and obtain a set of line systems that are not necessarily 
proper. If a line system is proper, we leave it as is. Otherwise, we cut the line 
system as follows. We traverse the line system from one end to the other and 
record every node that we visit in a sequence. If the line system is a closed loop 
we start from an arbitrary node and finish at the same node. (For example, the 
node sequence for the line system drawn in Figure l(Right) is ABCEEDCG.) 
If a node u appears multiple times in the node sequence, we mark the first 
appearance of u with an open parenthesis “(”, the last appearance of u with 
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a closed parenthesis and every other appearance of u with a closed and 
an open parenthesis “)(”• All these parentheses are labelled u. We match the 
ith open parenthesis labelled with u with the ith closed parenthesis also labelled 
with u. Each matching pair of parentheses represents a section of the line system 
that would induce non simple routing paths. We put down parentheses for every 
node that appears multiple times in the sequence. 

We use these parentheses to determine where to cut an improper line system. 
We say two parentheses form an inner-most pair if the left parenthesis is “(” , the 
right one is “)” and they do not contain any other parenthesis in between. Note 
the two parentheses in an inner-most pair do not have to have the same label. 
We now find the pair of inner-most parentheses that is also left most, and cut 
the sequence at the node v where the closed parenthesis from the selected pair 
sits. We remove every matching pair of parentheses that contains v. We repeat 
the above process until no parentheses are left. We refer to this algorithm as 
Cut Paren, which has 2 desirable properties as stated in Lemmas 2 and 3. 

Lemma 2. The Cut Paren algorithm defines a set of proper line systems. 

Lemma 3. Given simple routing paths, each transparent seetion defined hy Max 
Thru is cut into at most 2 pieces by Cut Paren. 

By Lemma 2 Cut Paren defines a set of proper line systems. By Lemmas 1 
and 3 we know that Cut Paren is at most twice as expensive as any optimal 
solution. Hence, 

Theorem 1. Given simple routing paths, Cut Paren is a 2- approximation al- 
gorithm. 

Using the lower bound given by Lemma 1, no algorithm can beat the ap- 
proximation ratio of 2 since the optimal solution can cost twice as much as the 
infeasible solution generated by Max Thru. For example, this can happen on a 
cycle with a unit demand between any two neighboring nodes u and v and the 
demand being routed along the “long” path excluding link uv. 

So far we have presented an algorithm that can cut an improper line system 
with a guaranteed approximation ratio of 2. In fact, we can always cut an im- 
proper line system L optimally as shown in [1]. Unfortunately, we do not know 
if this can lead to a better approximation ratio than 2. However, we have an 
example that shows that even if we cut the improper line systems generated by 
Max Thru optimally, we can still end up with a solution that costs 5/4 as much 
as the optimal solution with proper line systems. 

2.2 Networks of Node Degree 3 

The situation is far better when the network only has nodes of low degree. In 
particular, if the network has maximal degree of at most 3, we can indeed find 
an optimal set of proper line systems for any given simple demand routes. This 
is fortunate since optical networks are typically sparse, and only have a small 



34 



E. Anshelevich and L. Zhang 



number of nodes of degree greater than 3. Again, we begin with Max Thru. 
Since the node degree is at most 3, we observe that a resulting line system 
is improper only if it is a simple closed loop, i.e. a closed loop that induces 
simple routes only. We now examine each node u on the loop, which we call L. 
Let X and y be u's neighboring nodes on the loop L and z be u’s neighboring 
node off the loop (if such a z exists). Without loss of generality, let us assume 
the through traffic T(xuz) is no smaller than the through traffic T{yuz). (If 2 
does not exist, then T{xuz) = T{yuz) = 0.) An alternate configuration at u 
would be to cut xuy at u and connect xuz^ and we refer to this operation as a 
swap. This swap would increase the total number of transparent sections by I{u) 
where I{u) = T{xuz) — T{xuy). If node u' € L has the minimum increase I{u') 
then we perform the swap operation at u' . We swap exactly one node on each 
closed loop. We refer to this algorithm as Greedy Swap. Since every node has 
degree at most 3, Greedy Swap eliminates all loops and never creates new ones. 
Therefore, the Greedy Swap algorithm defines a set of proper line systems. 

Theorem 2. The Greedy Swap algorithm is optimal for given demand routing 
on networks with node degree at most 3. 

Proof. We first note that no matter how Greedy Swap breaks ties in decision 
making the resulting solutions have the same total number of transparent sec- 
tions. Given a solution by an optimal algorithm Opt, we now show that Greedy 
Swap produces an identical solution under some tie breaking. In particular, when 
executing Greedy Swap let us assume that whenever the algorithm is presented 
with a tie-breaking choice, it always arranges a node to be configured in the same 
way as in Opt, if possible. 




Fig. 2. (Left) OPT. (Right) Result of Greedy Swap. 



Now let us compare the line systems produced by Greedy Swap against 
the optimal solution. We first examine each node u for which Max Thru and 
Greedy Swap have the same configuration. We claim that Opt has the same 
configuration at tt as well. Let us focus on the case in which u has three neighbors 
X , y and z. (The cases in which u is degree 1 or 2 are simpler.) If Max Thru does 
not connect xuy, xuz or yuz, Opt must have the same configuration at u since 
the through traffic T{xuy), T{xuz) and T{yuz) must be all zero and Opt is used 
for tie breaking. Otherwise, let us assume without loss of generality that Max 
Thru connects xuy. For the purpose of contradiction, let us assume that Opt 
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connects xuz. By the construction of Max Thru and the tie breaking rule, we 
have T{xuy) > T{xuz). If Opt reconfigures node u by connecting xuy instead, 
a loop L must be formed or else we would have a better solution than Opt (see 
Figure 2). On the loop L there must be a node v ^ u such that Greedy Swap 
and Opt configure v differently, since otherwise Greedy Swap would contain 
this loop L. Let w be the first such node, i.e. all nodes on L between u and w in 
the clockwise direction are configured the same by Greedy Swap and Opt. 

If w has the same configuration in Max Thru and Greedy Swap, then 
by the definition of Max Thru and by the tie breaking rule, the configuration 
of w with Max Thru must be strictly better than its configuration with Opt. 
Hence, by reconfiguring Opt at nodes u and w like Greedy Swap, we get a 
better solution than Opt which is a contradiction. Therefore, as shown in Figure 
2, w must be a node that was cut by Greedy Swap on a loop generated by 
Max Thru that included u, which implies that I{w) < I{u). In fact, it must be 
the case that I{w) < I{u) by the tie breaking rule. Therefore, if we reconfigure 
Opt at nodes u and w like Greedy Swap, we once again get a better solution 
than Opt. This is a valid solution, since we cannot create any loops in Opt by 
reconfiguring both u and w. Hence, if Max Thru and Greedy Swap have the 
same configuration for node u, Opt has the same configuration at u as well. 

We finally examine each node u that Max Thru and Greedy Swap con- 
figure differently. This node u can only exist on a closed loop L created by Max 
Thru and each closed loop can only have one such node. For every node v £ L 
and V ^ u, Max Thru and Greedy Swap configure v in the same way by the 
construction of Greedy Swap. Hence, by our argument above. Opt configures 
V in the same way as well. Since Opt contains no loops, it has to configure node 
u like Greedy Swap. □ 

3 The General Case 

The general case in which neither demand routes nor line systems are given 
is much harder. Contrary to Theorem 2, the general case is NP-hard even for 
restricted instances on 3-regular networks. Furthermore, we also have a stronger 
result as stated in Theorem 3. The proof of this theorem uses a result by Bazgan, 
Santha, and Tuza[2], which states that the longest path problem cannot be 
approximated within a constant factor in 3-regular Hamiltonian graphs. 

Theorem 3. If routes are not specified, no algorithm can guarantee a constant 
approximation ratio better than 2 for our problem even if the network is 3-regular 
and Hamiltonian and even if all demands originate from the same source node. 

3.1 A Logarithmic Approximation 

In this section we present a logarithmic approximation for the general case. Let 
T be any tree that spans all source nodes and destination nodes. We choose an 
arbitrary node r in T to be the root of the tree. We route every demand from 
its source to r and then to its destination along edges of T. 
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We arrange the line systems as follows. Consider a node y, and let z be its 
parent node. If x is the child node of y that has the largest subtree then the only 
line system through node y is along zyx. The number of transparent sections 
from any node to the root r is at most log n where n is the number of nodes in 
T, since whenever the number of transparent sections increases by 1, the size of 
the subtree is cut by at least a half (also true because the path decomposition 
we have formed is a caterpillar decomposition [14]). Therefore, for any demand 
to go from its source node to its destination node the number of transparent 
sections is at most 2 log n. 

If we consider the “size” of a subtree as the number of destination nodes in 
the subtree rooted at x, then the same construction of the line systems as above 
would bound the maximum number of transparent sections to 2 log(2itT) where 
K is the total number of demands. Since the total cost of any solution must be 
at least K, we have the following theorem. 

Theorem 4. For the general case with unspecified demand routes, we can find 
a 2\ogn- approximation and a 21og(2itT) approximation in polynomial time. 

Notice that this approximation ratio is tight comparing against the lower 
bound, K, that we are using, as evidenced by a balanced binary tree with one 
unit demand between the root and every leaf node. 

3.2 Rings 

The ring topology is of particular interest since the underlying infrastructure of 
metro-area networks is often a ring [18]. In this context, we have a set of core 
nodes connected in a ring and each core node is attached to an access node, 
as in Figure 3(Left). Among other things, the access nodes have the capability 
of multiplexing low-rate local traffic streams into high-rate optical signals. Such 
access nodes are particularly useful when end users are far away from the core 
ring. In this ring network, all demands are between access nodes. It is easy to see 
that each demand has two routes, each access node has one configuration and 
each core node has three configurations which completely determine the layout 
of the line systems. Under this constrained scenario, it is reasonable to expect 
that we can do much better than the logarithmic approximation of Section 3.1. 
Indeed, we obtain the following result. 

Theorem 5. In ring networks, with no multiple demands between the same node 
pair, we can find a 3 /2-approximation in polynomial time. 

The proof of this theorem is somewhat involved and can be found in the full 
version of this paper at [1]. Here we will give the general idea of it. 

To prove the above theorem we show that either we can find an optimal 
solution efficiently or the whole ring solution serves as a 3/2 approximation. In 
the whole ring solution, all the core nodes form a single line system around the 
ring and it is cut open at an arbitrary core node. It is easy to see that in the 
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Fig. 3. (Left) A ring network. (Right) A consistent routing of S and the sets Sj. 



whole ring solution we can route the demands so that each demand requires 3 
transparent sections. 

Before we describe our algorithm we begin by describing crossing, non- 
crossing demands and their properties. For a demand that starts at v' and ends 
at w' , we refer to v and w (the corresponding core nodes) as the terminals of this 
demand. We call a pair of demands a crossing pair if the four terminal nodes are 
distinct and one terminal of a demand always appears in between the two ter- 
minals of the other demand. Now we consider a pair of non-crossing demands. If 
their routes do not overlap or overlap at one contiguous section only, then we say 
the demand paths are consistent. If the demand paths of all pairs of non-crossing 
demands are consistent, we say the solution has consistent routing. We establish 
the following two properties about routing non-crossing demands consistently. 

Lemma 4. There is an optimal solution that we eall OPT, for which the routing 
is eonsistent. 

Lemma 5. For a set of mutually non-crossing demands S, the number of ways 
to route the demands consistently is linear in |S'|. 

The Algorithm. We now describe our 3/2-approximation algorithm. From the set 
of demands, we take out crossing pairs of demands, one pair at a time in arbitrary 
order, until we have no crossing pairs left. Let C be the set of crossing pairs of 
demands that we have removed, and S be the set of non-crossing demands that 
are left. By Lemma 4 we know that OPT must use consistent routing, and 
by Lemma 5 we can efficiently enumerate all possible consistent routings for 
demands in S. Hence, if C is empty we know how to find an optimal solution 
and we are done. Otherwise, we assume from now on that we know how OPT 
consistently routes the demands in S. We also know that the routing of S results 
in a collection of disjoint intervals Ii, I 2 , . ■ . around the ring, where each interval 
Ij consists of links that are on the routes of demands of S. Let Sj C S' be the set 
of demands whose routes are included in Ij, as illustrated in Figure 3(Right). 
We denote by c{Sj) the total number of transparent sections that the demands 
in the set Sj use in OPT. Each set Sj obeys the following lemma. 

Lemma 6. We have c{Sj) > 2|Sj| — 1. The equality happens only if at each 
border node s of Ij, the configuration of s is s' sv where v & Ij. Finally, if I j is 
the entire ring, then c{Sj) > 2|Sj|. 
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If c{Sj) = 2|5j| — 1 and the interval Ij causes no extra jumps for any demand 
in C, we call the set Sj thrifty. We can prove the following key lemma. 

Lemma 7. If OPT does not contain a thrifty set, the whole ring solution is a 
3/2-approximation. If OPT contains a thrifty set, then we can find the routing 
of all demands of C in OPT. 

Using this lemma, we consider each set Sj, and find a routing of all demands 
assuming it is thrifty. Given the routes of all demands, we can find the optimal 
line systems by Theorem 2. If none of the Sj are thrifty, we use the whole ring 
solution which has a cost of 3K. Now that we have obtained a total of 0(x) 
solutions where x is the number of sets Sj, we choose the least expensive one 
among them. If it is the case that OPT contains no thrifty set, the solution 
we have found is no worse than the whole ring solution which guarantees a 3/2- 
approximation by Lemma 7. On the other hand, if OPT contains some thrifty set, 
our solution is guaranteed to find this set through enumeration and is therefore 
optimal. This proves Theorem 5. 

This finishes our 3/2-approximation algorithm for rings. However, if the de- 
mand originating at every node of the ring is bounded by some constant k, we can 
find the optimal solution in polynomial time, even if the same source-destination 
pair can have multiple demands. 

4 Variations and Open Problems 

In this paper, we introduced a new cost metric which measures the cost of routing 
a demand by the number of line systems that the demand travels through. We 
presented a collection of results. However, many questions remain open and some 
variations deserve further attention. 

The algorithm we have presented for line system partitioning with given 
demand routes guarantees a 2-approximation. However, we have no example in 
which our algorithm gives an approximation ratio worse than 5/4, or any reason 
to believe that a different algorithm may not do even better. If the demand routes 
are not given, we only have an 0(log n)-approximation that compares against a 
rather trivial lower bound, K, on OPT. In fact, we know that on a binary tree 
OPT can be as much as 0{Klogn). Therefore, a much better approximation 
seems likely. The only inapproximability result says that the approximation ratio 
cannot be better than 2, so even a small constant approximation may be possible. 

It is worth noting that our model makes several simplifications. Most notably, 
we assume that optical fibers have infinite capacity and that line systems can 
have arbitrary length. It would be quite interesting to introduce one or both of 
these complications into our model and see if good algorithms are still possible. 
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Abstract. In this paper we study the external memory planar point en- 
closure problem: Given N axis-parallel rectangles in the plane, construct 
a data structure on disk (an index) such that all K rectangles containing 
a query point can be reported I/O-efliciently. This problem has important 
applications in e.g. spatial and temporal databases, and is dual to the 
important and well-studied orthogonal range searching problem. Surpris- 
ingly, we show that one cannot construct a linear sized external memory 
point enclosure data structure that can be used to answer a query in 
0(loga N + K/B) I/Os, where B is the disk block size. To obtain this 
bound, fi{N / B^~^) disk blocks are needed for some constant e > 0. With 
linear space, the best obtainable query bound is 0(log2 N + K/B). To 
show this we prove a general lower bound on the tradeoff between the 
size of the data structure and its query cost. We also develop a family of 
structures with matching space and query bounds. 

1 Introduction 

In this paper we study the external memory planar point enclosure problem: 
Given N axis-parallel rectangles in the plane, construct a data structure on disk 
(an index) such that all K rectangles containing a query point q can be reported 
I/O-efficiently. This problem is the dual of the orthogonal range searching prob- 
lem, that is, the problem of storing a set of points in the plane such that the 
points in a query rectangle can be reported I/O-efficiently. The point enclosure 
problem has important applications in temporal databases; for example, if we 
have a database where each object is associated with a time span and a key 
range, retrieving all objects with key ranges containing a specific key value at a 
certain time corresponds to a point enclosure query. Also, in spatial databases, 
irregular planar objects are often represented in a data structure by their min- 
imal bounding boxes; retrieving all bounding boxes that contain a query point, 
i.e. a point enclosure query, is then the first step in retrieving all objects that 
contain the query point. Point enclosure can also be used as part of an algo- 
rithm for finding all bounding boxes intersecting a query rectangle, since such 

* Supported in part by the National Science Foundation through RI grant EIA- 
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a query can be decomposed into a point enclosure query, a range query, and 
two segment intersection queries. While a lot of work has been done on devel- 
oping worst-case efficient external memory data structures for range searching 
problems (see e.g. [3,17] for surveys), the point enclosure problem has only been 
considered in some very restricted models of computation [1,14]. In this paper 
we prove a general lower bound on the tradeoff between the size of an external 
memory point enclosure structure and its query cost, and show that this trade- 
off is asymptotically tight by developing a family of structures with matching 
space and query bounds. Surprisingly, our results show that the point enclosure 
problem is harder in external memory than in internal memory. 

1.1 Previous Work 

The B-tree [13] is the most fundamental external memory data structure. It 
uses 0{N/B) disk blocks of size B to store N elements and can be used to 
answer a one-dimensional range query in 0{log^ N + K/B) I/Os; an I/O is the 
movement of one disk block between disk and main memory. These bounds are 
optimal in comparison-based models of computation, and they are the bounds 
we would like to obtain for more complicated external memory data structure 
problems. In the study of such problems, a number of different lower bound 
models have been developed in recent years: the non-replicating index model [21], 
the external memory pointer machine model [25], the bounding-volume hierarchy 
model [1], and the indexahility model [20,19]. The most general of these models 
is the indexability model of Hellerstein, Koutsoupias, and Papadimitriou [20,19]. 
To our knowledge, it captures all known external data structures for reporting 
problems (that is, problems where the output is a subset of the objects in the 
structure). In this model, the focus is on bounding the number of disk blocks 
containing the answers to a query g, given a bound on the number of blocks 
required by the data structure. The cost of computing what blocks to access to 
answer the query (the search cost) is ignored. More formally, an instance of a 
problem is described by a workload W, which is a simple hypergraph (0,Q), 
where O is the set of TV objects, and Q is a set of subsets of O. The elements of 
Q = {9ij • ■ ■ j Qm} are called queries. An indexing scheme S for a given workload 
W = (0,Q) is a i3-regular hypergraph where each element of 6 is a 

i?-subset of O (called a block), and where O = UB. An indexing scheme S can 
be thought of as a placement of the objects of O on disk pages, possibly with 
redundancy. The cost of answering a query q is min{jS'j j B' C B,q C [JB'}, 
i.e., the minimum number of blocks whose union contains all objects in the 
query. The efficiency of an indexing scheme is measured using two parameters: 
its redundancy r and its access overhead A. The redundancy r is defined to be 
the average number of copies of an object stored in the indexing scheme, i.e. 
r = B\B\/N, and the access overhead A is defined to be the worst-case ratio 
between the cost of a query and the ideal cost [[gl/i?] (where [g] denotes the 
query output size). In other words, an indexing scheme with redundancy r and 
access overhead A occupies rfiV/i?] disk blocks and any query q is covered by 
at most Afjgj/i?] disk blocks. 
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Following the introduction of the indexability model, a sequence of results 
by Koutsoupias and Taylor [22], Samoladas and Miranker [24] and Arge, Samo- 
ladas, and Vitter [6] proved a tradeoff of r = l7(log(A^/i?)/log A) for the (two- 
dimensional) orthogonal range searching problem in the model. Using this trade- 
off, Arge, Samoladas and Vitter [6] proved that space is needed to 

design an indexing scheme that answers queries in 0(log3 N + K/B) I/Os, i.e., 
that two-dimensional range searching is harder than the one-dimensional case. 
They also designed a structure with matching bounds; note that this struc- 
ture works in the classical I/O-model [2] where the search cost is considered. 
A similar bound was proven in the external memory pointer machine model by 
Subramanian and Ramaswamy [25], and the result resembles the internal mem- 
ory pointer machine case, where 0(iVlog V/loglog A^) space is needed to obtain 
0(logN+K) query cost [11]. If only 0(A^/R) space can be used, 0{{N/BY) I/Os 
are needed to answer a query [6]. If exactly \N/B'\ space is allowed, 0{y/N/B) 
I/Os are needed [21]. 

Compared to orthogonal range searching, relatively little is known about the 
dual point enclosure problem. Arge and Vitter [7] designed a linear space external 
interval tree structure for the one-dimensional version of the problem where 
the data is intervals. This structure answers point enclosure queries, also called 
stabbing queries^ in 0(log^ N + K/B) I/Os. The two-dimensional version of the 
problem we consider in this paper can be solved using the linear space R-tree [18] 
or its variants (see e.g. [17] for a survey); very recently Arge et al. [5] designed 
a variant that answers a query in worst-case 0{^/N/B -|- K/B) I/Os. This is 
optimal in the very restrictive bounding- volume hierarchy model [1]. However, 
in the internal memory pointer machine model a linear size data structure that 
can answer a query in the optimal 0(logiV -|- K) time has been developed [10]. 
Thus, based on the correspondence between internal and external memory results 
for the range searching problem, as well as for the one-dimensional enclosure 
problem, one would expect to be able to develop a linear space and 0(log^ N + 
K/B) I/O query structure. No such structure is known. 



1.2 Our Results 

Surprisingly, in this paper we show that in the indexability model it is not 
possible to design a linear sized point enclosure indexing scheme that can 
answer a query in 0(logB A^ -I- K/B) I/Os. More precisely, we show that to 
obtain an 0{log^ N + K/B) query bound, Q{N / B^~Y disk blocks are needed 
for some constant e > 0; with linear space the best obtainable query bound 
is f?(log 2 Af -I- K/B). Thus in some sense this problem is harder in external 
memory than in internal memory. An interesting corollary to this result is that, 
unlike in internal memory, an external interval tree cannot be made partially 
persistent without increasing the asymptotic space or query bound. Refer to 
Table 1 for a summary of tight complexity results for the orthogonal range 
search and point enclosure problems when we are restricted to linear space or 
a logarithmic (base 2 or R) query bound. 
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Table 1. Summary of asymptotically tight complexity results for the orthogonal range 
searching and point enclosnre problems in the internal memory pointer machine model 
and the external memory indexability model. 



Range searching 


Internal memory 


External memory 


Query with linear space 
Space with log (base 2 or B) query 


[9] 

[10,11] 


{N/BY [6] 

N log(N/B) r„, 

B log log R N 



Point enclosure 


Internal memory 


External memory 


Query with linear space 
Space with log (base 2 or B) query 


log2 N [10] 

N [loj 


log^iN/B) New 
N/B^-^ New 



In section 2 we prove our lower bounds by proving a general tradeoff between 
the size of a point enclosure indexing scheme and its query cost. To do so, we 
refine the indexability model in a way similar to [6]: We introduce parameters 
^0 E^nd Ai instead of A, and require that any query q can be covered by at 
most Aq + Ai |"|g|/i?] blocks rather than A|"|q|/i?] blocks. With this refinement, 
we prove the lower bound tradeoff AqA^ = f2{log{N/ B) / logr), which leads 
to the results mentioned above. Interestingly, if the known tradeoff for range 
search indexing schemes is expressed in the refined indexability model, it becomes 
r = fl{\og{N / B) / \og{AQ ■ Ai)). Thus, the tradeoffs of the dual range search and 
point enclosure indexing problems are symmetric with respect to r and Aq,Ai. 

In Section 3 we show that our lower bound for the tradeoff is asymptotically 
tight in the most interesting case Ai = 0(1) by developing a family of external 
memory data structures (where the search cost is considered) with optimal space 
and query cost. More precisely, we describe a structure that, for any 2 < r < B, 
uses 0{rN/B) disk blocks and answers queries in 0{log^{N/ B) + K/B) I/Os. 
The structure can be constructed in 0{r^ ^ogj^^g ^) I/Os, where M is the size 
of main memory. 



2 Lower Bounds 

2.1 Refined Redundancy Theorem 

The Redundancy Theorem of [24,19] is the main tool in most indexability model 
lower bound results. We develop a version of the theorem for the refined index- 
ability model, that is, the model where the access overhead A has been replaced 
by two parameters Aq and Ai , such that a query is required to be covered by at 
most Aq + Aiflgj/R] blocks. 

Theorem 1 (Redundancy Theorem [24,19]). For a workload W = {O, Q), 

where Q = {qi,q 2 , ■ ■ ■ ,<lm} , let S be an indexing scheme with access overhead 
A < \[B jA: such that for any 1 < i, j < m,i j ■ 



B 



\qi\>B/2 and \qi C\ qj\ < 



16 ^ 2 ’ 
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then the redundancy of S is hounded by 

- m 

'■2l2jvSl«4 

i=l 

We extend this theorem to the refined indexability model as follows. 

Theorem 2 (Refined Redundancy Theorem). For a workload W = {O, Q), 

where Q = {gi, 52 , • ■ ■ , 9m}, let S he an indexing scheme with access overhead 
{Aq,Ai) with Ai < VB/8 such that for any 1 < i,j < m,i ^ j: 

kil > BAo and \q^ r\ qj\ < 

then the redundancy of S is bounded by 

^ m 

’■2 J2jvHI«4 

i=l 

Proof. Since for any i, > BAq, qi is required to be covered by at most Aq + 
Ai|"|gi|/i3] < 2Ai |"|gi|/i?] blocks. If we set A = 2Ai, S is an indexing scheme 
that covers each query of Q by A|"|g|/i?] blocks. Then applying Theorem 1 leads 
to the desired result. 

2.2 Lower Bound for Point Enclosure 

In order to apply Theorem 2 to the point enclosure problem we need to design 
a workload W = {O, Q) such that each query is sufficiently large but the in- 
tersection of any two is relatively small. To do so we use the point set called a 
Fibonacci lattice, which was also used in previous results on orthogonal range 
searching [22,6,19]. 

Definition 1 ([23]). The Fibonacci lattice F^ is a set of two-dimensional 
points defined by F^ = {{i,ifk-i 'mod m)|j = 0, 1, . . . ,m — 1)}, where m = fk is 
the kth Fibonacci number. 

The following property of a Fibonacci lattice will be crucial in our lower 
bound tradeoff proof. 

Lemma 1 ([16]). For the Fibonacci lattice Fm and for a > 0, any rectangle 
with area am contains between [n/cij and [ 0 / 02 ] points, where c\ ~ 1.9 and 
C2 ~ 0.45. 

Let m = A where a is a parameter to be determined later, and 1 < A < 2 
is chosen such that m is a Fibonacci number. The idea is to use the points in 
as queries to form Q and construct approximately N rectangles as objects to form 
O; in the previous range searching tradeoff the roles of the points and rectangles 
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were reversed [6]. We may not construct exactly N rectangles but the number 
will be between XN and 4AA^, so this will not affect our asymptotic result. 

We construct rectangles of dimensions af x m/t®, where t = 
and 1 = 1,..., BAq. For each choice of i, the rectangles are constructed in a tiling 
fashion on Fm until they cross the boundary. In this way, between ^ 
and rectangles are constructed on each layer (each i), for a total of 0{N) 

rectangles. 

Since each point is covered by BAq rectangles, one from each layer, the first 
condition of Theorem 2 is satisfied. For the second one, consider any two different 
query points qi and q 2 G Fm, and let x and y be the differences between their 
X and ^-coordinates, respectively. Since the rectangle with qi and <72 as corners 
contains at least two points, namely qi and <72 themselves, by Lemma 1, we have 
xy > C 2 m. We now look at how many rectangles can possibly cover both qi and 
( 72 . For a rectangle with dimension at® x m/t® to cover both points, we must 
have at® > x and m/t® > y, or equivalently, x/a <F < m/y. rectangle for each 
i that can cover both points, thus, |<7i H 92 ! is at most 




am 

xy 



log(g/c 2 ) 

logt 



■ log(g/c 2 ) 
°log(m/a) 



< 1 -|- BAq 



log(a/c2) 
log(AfV/(i3Ao)) ■ 



The second condition holds as long as 

1 log(g/c2) 1 

B °log{XN/{BAo)) - 64^2 ■ ^ ’ 

On the other hand, when (1) is satisfied, by Theorem 2, we have 

1 mBAQ a 
^ - 12 4AiV ^ 48' 

Therefore, any combination of a, Aq and Ai that satisfy (1) constitutes a lower 
bound on the tradeoff of r, Aq and Ai. When Aq and A\ are small enough, 
namely Aq < {N/B)^~^ where 0 < 5 < 1 is some constant, and Ai < ,JbJ\^, 
we can simplify (1) to obtain the following: 



Theorem 3. Let S he a point enclosure indexing scheme on the Fibonacci work- 
load. If Aq < ^ for any fixed 0 < A < 1, and Ai < ^jBjYl%, then its 

redundancy r and access overhead (^ 0 ,^ 1 ) must satisfy 






J_log(7V/B) ^ ^ / log(jV/i?) \ 
128 7 -I- log r \ log r ) 



The Fibonacci lattice is only one of many low-discrepancy point sets [23] we 
could have used, but none of them would improve the result of Theorem 3 by 
more than a constant factor. We note that the above proof technique could 
also have been used to obtain a tradeoff in the original indexability model. 
However, in that case we would obtain a significantly less informative tradeoff 
oi A = ^2{^J\og{N/B)/\ogr). 
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2.3 Tradeoff Implications 



Query cost log^ N. In indexing schemes with an 0{logg N + K/B) query cost 
we have Aq = 0{logg N) and Ai = 0(1). Assuming that Aq < cglog^ we 
have 



logjN/B) ^^,^ 6 logjN/B) 
log B ^ ~ 128 7 + log r 



which is 



logr > 



128coA2 



logB -7, or r> _s<5/(128coA?)_ 



128 



Thus any such indexing scheme must have redundancy at least fi{B^), for some 
small constant e. 



Corollary 1. Any external memory data structure for the point enclosure prob- 
lem that can answer a query in 0(log^ N + K/B) I/Os in the worst case must 
use I2{N / B^~'^) disk blocks for some constant e > 0. 



Linear space. In linear space indexing schemes r is a constant. If we also want 
Ai to be a constant, Aq has to be 0(log by Theorem 3. As mentioned, 
this is a rather surprising result, since linear space structures with the optimal 
0{logN + K) query time exist in internal memory [10]. 

Corollary 2. Any linear sized external memory data structure for the point 
enclosure problem that answers a query in f{N) + 0{K/B) I/Os in the worst 
case must have f{N) = 12(log2 ^). 

Note however that our lower bound does not rule out linear size indexes with 
a query cost of for example 0(log^ N + y/logB ^). 

Persistent interval tree. An interesting consequence of the above linear space 
result is that, unlike in internal memory, we cannot make the external interval 
tree [7] partially persistent [15] without either increasing the asymptotic space 
or query bound. Details will appear in the full paper. 



3 Upper Bounds 

In this section, we develop a family of external memory data structures with 
space and query bounds that match the lower bound tradeoff of Theorem 3. 
In Section 3.1 we first develop a structure that uses 0{rN/B) disk blocks and 
answers queries in 0{logg N ■ logj.{N/B) + K/B) I/Os for any 2 < r < B. 
In Section 3.2 we then discuss how to improve the query bound to obtain the 
O/og/N / B) + K/B) bound matching Theorem 3. These bounds are measured 
in the classical I/O model [2], where the search cost is considered, and where 
space used to store auxiliary information (such as e.g. “directories” or “internal 
nodes”) is also counted when bounding the sizes of the structures. 
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Fig. 1. Partitioning of rectangles among nodes in a 3-ary base tree. 



3.1 Basic Point Enclosure Structure 

Base tree T. Let ^ be a set of N axis-parallel rectangles in the plane. Our 
structure for answering point enclosure queries on S is similar to an internal 
memory point enclosure structure due to Chazelle [10]. Intuitively, the idea is 
to build an r-ary base tree^ T by first dividing the plane into r < B horizontal 
slabs with approximately the same number of rectangle corners in each slab, and 
then recursively construct a tree on the rectangles completely contained in each 
of the slabs. The rectangles that cross one or more slab boundaries are stored in 
a secondary structure associated with the root of T. The recursion ends when 
the slabs contain at most 4B rectangle corners each. Refer to Figure 1. 

The base tree T can relatively easily be constructed and the rectangles dis- 
tributed to the internal nodes of T in log^f/s %) I/Os (the number of I/Os 
needed to sort N elements): We first sort the rectangle corners by y-coordinate. 
Then we construct the top 0(log^ levels of the tree, that is, 0{M/B) nodes, 
in 0{N/B) I/Os. Finally, we distribute the remaining (sorted) rectangles to 
the leaves of the constructed tree and recursively construct the corresponding 
subtrees. Details will appear in the full version of this paper. 

Now let Ny be the number of rectangles associated with an internal node 
V in T. Below we describe how the secondary structure associated with v uses 
0{rNy/B -1- r) blocks and can be constructed in 0(^^ log^/^ ^) I/Os. Since 
there are 0{N/{Br)) internal nodes, and since each rectangle is stored in exactly 
one leaf or internal node, this means that after distributing the rectangles to 
nodes of T, all the secondary structures use 0{rN/B) blocks and they can be 
constructed in ^) I/Os. 

Answering a query on T. To answer a query q = (xg,yq) on T we simply query 
the secondary structure of the root v of T, and then recursively query the tree 

^ The fanout of the root of the base tree may be less than r to make sure that we have 
0{N/B) nodes in the tree. 
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rooted in the node corresponding to the horizontal slab containing q. When we 
reach a leaf, we simply load the at most B rectangles in the leaf and report 
all rectangles that contain q. Below we describe how a query on the secondary 
structure of v can be performed in 0(log^ N + Ky/B) I/Os, where Ky is the 
number of reported rectangles. Thus a query is answered in 0{logg N ■ log^ ^ + 
K/B) I/Os overall. 

Secondary structures. All that remains is to describe the secondary structure 
used to store the Ny rectangles associated with an internal node v of T. The 
structure actually consists of 2r small structures, namely two structures for 
each of the r horizontal slabs associated with v. The first structure S'/ for slab i 
contains the top (horizontal) sides of all rectangles with top side in slab i, as well 
as all rectangles that completely span slab i; the second structure S'/ contains 
the bottom (horizontal) sides of all rectangles with bottom side in slab i. The 
structure If/ supports upward ray-shooting queries from a point q = {xq, yq), that 
is, it can report all segments intersected by the vertical half-line from (xq,yq) 
to (a;q,-|-oo). Refer to Figure 2. Similarly, supports downward ray-shooting 
queries. Thus, it is easy to see that to answer a point enclosure query q = {xq, yq) 
in V, all we need to do is to query the >f'/ and iF/ structures of the slab i containing 
point (xq,yq). Refer to Figure 3. 

All \Pt and 'Ph structures are implemented in the same basic way, using a (par- 
tially) persistent B-tree [8,4]. A persistent B-tree uses 0{N/B) blocks, supports 
updates in 0{logg N) I/Os in the current version of the structure as normal 
B-trees, and answers range queries in 0(log^ N -\- K/B) I/Os in all the previous 
versions of the structure. In these bounds, N is the number of updates per- 
formed on it. To build we simply imagine sweeping the plane with a vertical 
line from — oo to -l-oo, while inserting the y-coordinate yi of a horizontal segment 
(xi,yi,X 2 ) when its left endpoint Xi is reached, and deleting it again when its 
right endpoint X 2 is reached. An upward ray-shooting query q = (xq,yq) can 
then be answered in 0(log^ N -\- Ky/B) I/Os as required, simply by performing 
a range query {yq, -|-oo) on the structure we had when the sweep- line was at Xq. 
'Pb is constructed in a symmetric manner. 



(xg,-t-oc) 



^ i-r;,, Ih) 
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Fig. 2. An upward ray-shooting query. Fig. 3. Answering a point enclosure 

query. 
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The top horizontal segment of each of the iV„ rectangles associated with v 
can be stored in r Wt structures, while the bottom segments are stored in exactly 
one >?'(,• Therefore, since a S'* or '1'^ structure on L segments uses 0{L/B) blocks 
and can be constructed in 0(^ I/Os [26], the 2r structures in v use 

0{rNy/B + r) blocks and can be constructed in logjvf/s I/Os overall. 

Theorem 4. A set of N axis-parallel rectangles in the plane can be stored in 
an external memory data structure that uses 0{rN/B) blocks, such that a point 
enclosure query can be answered in 0{logg N ■ log^ ^ + K/B) I/Os, for any 
2 < r < B . The structure can be constructed using 0{r^log^/g I/Os. 



3.2 Improved Point Enclosure Structure 

In this section we sketch how to improve the 0(log^ N-log.,.{N/B)-\-K/B) query 
bound of the structure described in the previous section to 0{log^{N/ B)-\-K/ B) . 
First note that the extra log^ iV-term was a result of the query procedure using 
Oflogg N) I/Os to search a secondary structure (a Tt and a <Ff, structure) on 
each level of the base tree T. Since all of these structures are queried with the 
same query point q, it is natural to use fractional cascading [12] to reduce the 
overall search cost to 0 (log 3 N -\-logj.{N / B) K / B)) = 0(log^{N / B) K / B)) . 
However, in order to do so we need to modify the Tt and iZ/, structures slightly. 
Below we first discuss this modification and then sketch how fractional cascading 
can be applied to our structure. We will only consider the Tt structure since the 
Tb structure can be handled in a similar way. 

Modified secondary structure. The modification of the Tt structure is inspired 
by the so-called hive- graph technique of Chazelle [10]. In order to describe it, 
we need the following property of a Tt structure (that is, of a persistent B-tree) , 
which follows easily from the discussions in [8]. 

Lemma 2. Each of the 0{L/B) leaves of a persistent B-tree Tt on L segments 
defines a rectangular region in the plane and stores the (at most) B segments 
intersecting this region. The regions of all the leaves define a subdivision of the 
minimal bounding box of the segments in Tt and no two regions overlap. An 
upward ray-shooting query on Tt visits 0(1 -|- K/B) leaves. 

By Lemma 2 an upward ray shooting query q = (xq, yq) on a Tt in node v will 
visit 0{l-\- Ky/ B) regions (leaves) in the subdivision induced by the leaves. Now 
suppose we have links from any region (leaf) to its downward neighbors. Given 
the highest region that contains Xq we could then follow these links downward 
to answer the query. To find the relevant highest regions, we introduce a head 
list consisting of the left x-coordinates of the 0{L/B) highest regions in sorted 
order, where each entry also has a pointer to the corresponding region/leaf. The 
top region can then be found simply by searching the head list. Refer to Figure 4. 

There is one problem with the above approach, namely that a region may 
have more than B downward neighbors (so that following the right link may 
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Fig. 4. The leaves of a persistent B-tree Fig. 5. Reorganizing the leaves in the 
defines a rectangular subdivision. persistent B-tree {B — 2) 



require more than one I/O). We therefore modify the subdivision slightly: We 
imagine sweeping a horizontal line from — oo to oo and every time we meet the 
bottom edge of a region with more than B downward links, we split it into 
smaller regions with 0{B) downward links each. We construct a new leaf for 
each of these regions, all containing the relevant segments from the original 
region (the segments intersecting the new region). Refer to Figure 5. Chazelle 
showed that the number of new regions (leaves) introduced during the sweep is 
0{L/B) [10], so the modified subdivision still consists of 0{L/B) regions. It can 
also be constructed in 0{j^ log^/g ^) I/Os. Due to lack of space, we omit the 
details. They will appear in the full paper. 

Lemma 3. A modified \l't structure on L segments consists of 0{L/ B) blocks 
and a head list of 0{L / B) x-coordinates, such that after locating Xq in the head 
list, an upward ray-shooting query q = (xq,yq) can be answered in 0(1 -I- K/B) 
I/Os. The structure can be constructed in 0{^logj^^/g g) I/Os. 

Fractional cascading. We are now left with the task of locating Xq in the head 
list of each Ft structure that we visit along a root-to-leaf path in the base tree 
T. This repetitive searching problem can be solved efficiently by adapting the 
fractional cascading technique [12] to the external memory setting, such that 
after locating Xq in the first head list, finding the position of Xq in each of the 
subsequent head lists only incurs constant cost. Again, due to space limitations 
we omit the details. In the full version of this paper we show how the search cost 
of our structure can be reduced to 0(log,r ^ + -^/^) I/Os while maintaining the 
space and construction bounds; this leads to our main result. 

Theorem 5. A set of N axis-parallel rectangles in the plane can be stored in 
an external memory data structure that uses 0(rN / B) blocks such that a point 
enclosure query can be answered in ©(log^. ^ -I- K/B) I/Os, for any 2 < r < B . 
The structure can be constructed with 0(r^log^/g I/Os. 
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Abstract. We study a basic problem in Multi-Queue switches. A switch 
connects m input ports to a single output port. Each input port is 
equipped with an incoming FIFO queue with bounded capacity B. A 
switch serves its input queues by transmitting packets arriving at these 
queues, one packet per time unit. Since the arrival rate can be higher than 
the transmission rate and each queue has limited capacity, packet loss 
may occur as a result of insufficient queue space. The goal is to maximize 
the number of transmitted packets. This general scenario models most 
current networks (e.g., IP networks) which only support a “best effort” 
service in which all packet streams are treated equally. A 2-competitive 
algorithm for this problem was designed in [4] for arbitrary B. Recently, 
a ^ « 1.89-competitive algorithm was presented for B > 1 in [2]. Our 
main result in this paper shows that for B which is not too small our 
algorithm can do better than 1.89, and approach a competitive ratio of 
^ « 1.58. 

1 Introduction 

Overview: Switches are a fundamental part of most networks. Networks use 
switches to route packets arriving at input ports to an appropriate output port, 
so that the packets will reach their desired destination. Since the arrival rate 
of packets can be much higher than the transmission rate, each input port is 
equipped with an incoming queue with bounded capacity. Considering that on 
the one hand, traffic in networks, which has increased steadily during recent 
years, tends to fluctuate, and on the other hand, incoming queues have limited 
space, packet loss is becoming more of a problem. As a result, over the recent 
years, considerable research has been carried out in the design and analysis of 
various queue management policies. 

We model the problem of maximizing switch throughput as follows. A switch 
has m incoming FIFO queues and one output port. At each time unit, new 
packets may arrive at the queues, each packet belonging to a specific input 
queue. A packet can only be stored in its input queue if there is enough space. 
Since the m queues have bounded capacity, packet loss may occur. All packets 
have the same value, i.e. all packets are equally important. At each time unit, 

* Supported by Israeli Ministry of industry and trade and Israel Science Foundation. 
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the switch can select a non-empty queue and transmit a packet from the head 
of that queue. The goal is to maximize the number of transmitted packets. 

This model is worth studying, since the majority of current networks (most 
notably, IP networks) do not yet integrate full QoS capabilities. Traditionally, 
similar problems were analyzed while assuming either some constant structure 
of the sequence of arriving packets, or a specific distribution of the arrival rates 
(see e.g. [7,15]). We, in contrast, avoid any a priori assumption on the input, and 
therefore use competitive analysis to evaluate the performance of online algo- 
rithms to the optimal off-line solution that knows the entire sequence in advance. 
Specifically, the competitive ratio of an online algorithm is the supermom, taken 
over all finite sequences, of the ratio between the online throughput and optimal 
throughput on the same input. 

In [4] Azar and Richter showed that any deterministic work-conserving algo- 
rithm, i.e. transmits a packet in each unit of time if not all queues are empty, is 
2-competitive. They also showed an ^^-competitive randomized algorithm. Re- 
cently, Albers and Schmidt [2] introduced a ^ ~ 1.89-competitive algorithm for 
R > 1 . In this paper we show that for the case where B is larger than log m our 
algorithm approaches a competitive ratio of ~ 1.58, which is significantly 
better than 1.89. 

Our results: 

— Our main contribution is an k, 1.58-competitive algorithm for the switch 

throughput problem, for any large enough B. Specifically, we show a frac- 
tional algorithm for the switch throughput problem, i.e. one that can insert 
fractions of packets into each queue if there is sufficient space and transmit a 
fraction from each queue of total of at most 1, with competitiveness of 
Then we transform our fractional algorithm into a discrete algorithm, i.e. 
one that can insert and transmit integral packets, with competitiveness of 
y§y(l-|- where Hm = ~ Inm is the mth harmonic number. 

— We use two tools which are of their own interest : 

• We consider the online unweighted fractional matching problem in a 
bipartite graph. In this problem we have a bipartite graph G = (i?, S, E) 
where S and R are the disjoint vertex sets and E is the edge set. At step 
z, vertex ri € R together with all of its incident edges, arrives online. The 
online algorithm can match fractions of to various adjacent vertices 
Sj G S. We prove an upper bound of y§y on the competitive ratio of a 
natural online “water level” matching algorithm. 

• We present a generic technique to transform any fractional algorithm for 

the switch throughput problem into a discrete algorithm. Specifically, we 
show that given a c-competitive online fractional algorithm for the switch 
throughput problem, we can construct an online discrete algorithm with 
competitiveness of c(l -I- for the switch throughput problem. 

Side results: 

— We show a lower bound of jfy — 0{^) on the competitive ratio of any 
online unweighted fractional matching algorithm in a bipartite graph. Thus, 
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we prove that our “water level” matching algorithm is optimal, up to an 
additive smaller than 2 for the size of the matching. In addition, this lower 
bound yields a lower bound of ~ competitive ratio of any 

randomized discrete algorithm for the online unweighted matching problem 
in a bipartite graph. Therefore, we slightly improve the asymptotic lower 
bound of Karp et al. [10] with a non-asymptotic lower bound. 

— We consider the online b-matching problem. In the online b-matching prob- 
lem we have a bipartite graph G = (i?, S', E), where ri G R can be matched 
to a single adjacent vertex Sj € S such that every sj can be matched up to 
b times. We introduce a more compact analysis for the upper bound of the 
competitive ratio obtained by algorithm Balance for the b-matching prob- 
lem that was studied by Kalyanasundaram and Pruhs [9], using a similar 
technique to the one which was used in our analysis of the “water level” 
algorithm. 

Our techniques: We start by studying the online fractional unweighted match- 
ing problem in a bipartite graph. We introduce a natural “water level” algorithm 
and obtain the competitiveness of We next construct an online reduc- 
tion from the fractional switch throughput problem to the problem of finding 
a maximum fractional matching in a bipartite graph. Thus, we obtain a 
competitive algorithm, for the fractional switch problem, which emulates the 
fractional matching constructed in the bipartite graph. Next, we present a generic 
technique to transform any fractional algorithm for the switch throughput prob- 
lem into a discrete algorithm. Specifically, we present an algorithm with queues 
larger than B that maintains a running simulation of the fractional algorithm in 
a relaxed fractional model and transmits from the queue with the largest over- 
load with respect to the simulation. We then transform this algorithm into an 
algorithm with queues of size B. 

Related results for the switch throughput problem with unit value: 

The online problem of maximizing switch throughput has been studied exten- 
sively during recent years. For the unit value scheduling, Azar and Richter [4] 
showed that any deterministic work-conserving algorithm, i.e. one that transmits 
a packet in each unit of time if not all queues are empty, is 2-competitive. In addi- 
tion they showed a lower bound of 2 — A for the case where B = 1. For arbitrary 
B they showed a lower bound of 1.366 — 6>(^). They also considered randomized 
online algorithms and presented an ^^-competitive randomized algorithm. For 
B = 1, they showed a lower bound of 1.46 — i9(^) on the performance of every 
randomized online algorithm. Recently, a ^ ~ 1.89-competitive deterministic 
algorithm was presented by Albers and Schmidt [2], for B > 2. For the case 
B = 2 they showed that their algorithm is optimal. They also showed that any 
greedy algorithm is at least 2 — ^-competitive for any B and large enough m. In 
addition, they showed a lower bound of on the competitive ratio of any de- 
terministic online algorithm and a lower bound of 1.4659 on the performance of 
every randomized online algorithm, for any B and large enough m. We note that 
the lower bound obtained by Albers and Schmidt does not apply to R > log m 
considered in our paper. 
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Related results for the switch throughput problem with values: For the 

case where packets have values, and the goal is to maximize the total value of the 
transmitted packets, there are results for a single queue and for a multi-queue 
switch in [1,3,12,14,11,4,5]. In particular, for the switch throughput problem 
there is a 3-competitive algorithm for the preemptive case, and a logarithmic in 
the range value competitive algorithm for the non-preemptive case. 

Related results for the cost version: Koga [13], Bar-Noy et al. [6], and 
Chrobak et al. [8] showed a 0(log m)-competitive algorithm for the problem of 
minimizing the length of the longest queue in a switch. In their model, queues 
are unbounded in size, and hence packets are not lost. 

Related results for the online unweighted matching problem in a bi- 
partite graph: In the online unweighted matching problem we have a bipartite 
graph G = {R, S, E) where S and R are the disjoint vertex sets and E is the 
edge set. At step i, vertex ri G R together with all of its incident edges, arrives 
online. The online algorithm can match vertex € i? to an adjacent vertex 
Si € S, and the goal is to maximize the number of matched vertices. Karp et al. 
[10] observed that in the integral unweighted matching problem in a bipartite 
graph any deterministic algorithm, which never refuses to match if possible, is 2- 
competitive and no deterministic algorithm can be better than 2-competitive. In 
addition, they introduced a randomized algorithm RANKING with a compet- 
itive ratio of — o(l) and proved a lower bound of — o(l), thus obtaining 
the optimality of RANKING, up to lower order terms. We improve the lower 
order terms in their lower bound. The online unweighted matching problem was 
generalized by Kalyanasundaram and Pruhs in [9] to the b-matching problem. 
In this model each vertex Vi G R can be matched to a single adjacent vertex 
Sj G S such that every Sj can be matched up to b times. They showed an optimal 

online algorithm Balance with competitiveness of , i A ^ 

Paper structure: Section 2 includes formal definitions and notation. In Sec- 
tion 3 we consider the online unweighted fractional matching and obtain an 
upper bound for a natural online algorithm. We address the fractional version of 
the switch throughput problem in Section 4 and obtain an upper bound for this 
problem. In Section 5 we present our general technique for the discretization of 
any fractional algorithm for the switch throughput problem and obtain an upper 
bound for the outcome algorithm. Open problems are presented in Section 6. 



2 Problem Definition and Notations 

We model the switch throughput maximization problem as follows. We are given 
a switch with m FIFO input queues, where queue i has size Bi, and one output 
port. Packets arrive online at the queues, every packet belonging to a specific 
input queue. All packets are of equal size and value. We denote the online finite 
packet sequence by a. Initially, all m queues are empty. Each time unit is divided 
into two phases: in the arrival phase a set of packets arrives at specific input 

In [9] and [10] the anthors used a smaller than 1 definition for the competitive ratio. 
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queues and may be inserted into the queues if there is sufficient place. Remaining 
packets must be discarded. In the transmission phase the switching algorithm 
may select one of the non-empty queues, if such exists, and transmit the packet 
at the head of that queue. The goal is to maximize the number of transmitted 
packets. We consider the non-preemptive model, in which stored packets cannot 
be discarded from the queues. We note that in the unit value model the admission 
control question, i.e which packet to accept, is simple, since there is no reason 
to prefer one packet over the other. Therefore, it is enough to focus on the 
scheduling problem. 

For a given time t, we use the term load of queue i to refer to the number of 
packets residing in that queue, at that time. For a given time t, the term space of 
queue i is used to refer to its size minus its load, i.e. how many additional packets 
can be assigned to queue i, at that time. Given an online switching algorithm A 
we denote by A{a) the value of A given the sequence a. We denote the optimal 
(offline) algorithm by OPT, and use similar notation for it. A deterministic online 
algorithm A is c-competitive for all maximization online problems described in 
this paper (the unweighted fractional matching problem in a bipartite graph, 
b-matching problem in a bipartite graph, and switch throughput maximization 
problem fractional version as well as discrete version) iff for every instance of 
the problem and every packet sequence a we have: OPT{a) < c ■ A{a). 

We prove our upper bound by assuming that all queues in the switch are 
of equal size B. We note that this is done for simplicity of notation only. The 
algorithms we present remain the same when queues have different sizes, where 
in our upper bound, B will be replaced by mini{i?i}. 



3 Online Unweighted Fractional Matching in a Bipartite 
Graph 

Our switching algorithm is based on an online fractional matching algorithm. 
Thus we start by considering the online unweighted matching problem in a bi- 
partite graph, which is defined as follows. Consider an online version of the 
maximum bipartite matching on a graph G = (R, S, E), where S and R are the 
disjoint vertex sets and E is the edge set. We refer to set R as the requests and 
refer to set S as the servers. The objective is to match a request to a server. 
At step i, vertex ri G R together with all of its incident edges, arrives online. 
The response of A can be either to reject r^, or to irreversibly match it to an 
unmatched vertex Sj € S adjacent to r^. The goal of the online algorithm is to 
maximize the size of the matching. 

The fractional version of the online unweighted matching is as follows: Each 
request has a size Xi which is the amount of work needed to service request 
Ti. Algorithm A can match a fraction of size fc) € [0,3;^] to each vertex Sj € S 
adjacent to If request i is matched partially to some server j with weight 
kj then have to maintain that the load of each server j 

denoted by Ij, which is ^ where n is the length of a, is at most 1. The 
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goal of the online algorithm is to maximize where this is the size of the 

fractional matching M, denoted by |M|. 

We show that a natural “water level” algorithm WL for the fractional match- 
ing problem is ~ 1.58-competitive, even against a fractional OPT. Intu- 
itively, given a request € R, algorithm W L flattens the load on the minimum 
loaded servers adjacent to r^. 

Algorithm WL: For each request ri, match a fraction of size fc* for each adjacent 
Sj, where fc* = {h — and /i < 1 is the maximum number such that — 

Xi- By (/)+ we mean max{/, 0}. 

Theorem 1. Algorithm WL is ~ 1.58-competitive. 

Theorem 2. Algorithm WL is -competitive in a resource augmentation 
model, where the online algorithm can match a times more than the optimum 
algorithm on a single server. 

Remark 1. Our technique can be extended for analyzing the b-matching problem 
that was studied in [9]. Speciflcally, we simplifled the proof of the competitive- 
ness of algorithm Balance for the b-matching problem which was studied by 
Kalyanasundaram and Pruhs [9]. 

4 The Maximizing Switch Throughput Problem - The 
Fractional Version 

In this section we consider a fractional model, which is a relaxation of the dis- 
crete model which was presented in Section 2. In the fractional model we allow 
the online algorithm to accept fractions of packets into each queue if there is 
sufficient space, and also to transmit fractional of packets from the head of each 
queue s.t. the sum of transmitted fractions is at most 1. 

We assume that sequence a consists of integral packets. However, this re- 
striction is not obligatory for this section. The restriction is relevant for the 
transformation of a fractional algorithm for the switch throughput problem into 
a discrete scheduling algorithm. We focus on algorithms that accept packets 
or fractional packets if sufficient place exists; since all packets have unit values 
there is no reason to prefer one packet over another. We refer to such algorithms 
as greedy admission control algorithms, and therefore focus on the scheduling 
problem alone. We show that a fractional algorithm for the scheduling problem 
is ~ 1.58-competitive, even against a fractional adversary. We begin by in- 
troducing a translation of our problem (the fractional model) into the problem 
of online unweighted fractional matching in a bipartite graph. ^ 

Given a sequence a, we translate it into the bipartite graph = {R, S, E), 
which is defined as follows: 

— Let T denote the latest time unit in cr in which a packet arrives. We define 
the set of time nodes as R = {ri, . . . , rr+ms}- Each ri & R has unit size, 
i.e. Xi = l for each 1 < i < T -|- mB 

^ A similar translation was presented in [4] for integral scheduling. 
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— Let P be the total number of packets specified in cr. We define the set of 
packet nodes as S = {si, . . . , sp}. 

— Let P* denote the set of the last B packets that arrive at queue qi until 
time t (inclusive). Define P* = UHi A*- We define the set of edges in as 
follows: E = {{rt,Sp)\p G P*}. 

Note that in G"^, time and packets nodes arrive in an online fashion. Thus 
the problem is how to divide the time among the packets, where edges connect 
each of the time nodes to the packet nodes which correspond to packets that can 
be resident in some queue in time t. For consistency with Section 3 we denote 
the time nodes by requests and the packet nodes by servers. 

Remark 2. We note that in contrast to the matching model that was presented 
in Section 3, here the servers as well as the requests arrive online. We emphasize 
that this does not change the results obtained in Section 3 since in the servers 
which arrive at time t only connect to request rpyt and thus can be viewed as 
servers which were present but not yet adjacent to requests rp^t. 

Before we proceed we introduce some new definitions. 

Definition 1. A schedule SC for a sequence of arriving packets a, for the switch 
throughput problem, is a set of triplets of the form (t, qi,k\) for each 1 < i < m, 
where queue qi is scheduled for the transmission of a fraction of size k\ > Q at 
time t. The size of the schedule, denoted by I^Gj, is the total amount of fractions 
scheduled for transmission, i.e. i K- 

Definition 2. A schedule SC for a sequence a is called legal if for every triplet 
{t, qi, k\), queue qi has a load of at least k\ at time t and ^ 1- 

The following lemmas connect bipartite fractional matching to our problem. 

Lemma 1. Every legal fractional schedule SC for the sequence a can be mapped, 
in an online fashion, to a fractional matching M in G'^ such that I^Gj = \M\. 

Lemma 2. Every matching M in can be translated, in an online fashion 
in polynomial time, to a legal schedule SC for a such that |S'G| = \M\, while 
performing greedy admission control. 

The following corollary is a directly result from Lemmas 1 and 2. 

Corollary 1. For any sequence a , the size of the optimal fractional schedule for 
a is equal to the size of a maximum fractional matching in G”’. In particular, 
optimal fractional schedule (which is equal to optimal integral matching) can be 
found (offline) in polynomial time. 

Now, we present a fractional scheduling algorithm EP for maximizing switch 
throughput in the relaxed model. Since the bipartite unweighted fractional 
matching problem is connected to the maximizing switch throughput problem, 
algorithm EP intuitively bases its scheduling decisions on the matching con- 
structed by algorithm WL, which was presented in Section 3, in the online 
constructed graph G'^. 
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Algorithm EPi 

— Maintain a running simulation oiWL \ii the online constructed graph G°’ . 

— Admission control: perform greedy admission control. 

— Scheduling: transmit the total size of the fractions that were matched from 
Tt to servers in Pf from the head of queue i. 

Remark 3. We note that algorithm EP actually constructs a legal schedule for 
a incrementally as shown by Lemma 2. 

Theorem 3. For every sequence a, OPT (a) < -^^EP{a) . 

5 The Maximizing Switch Throughput Problem 

In this section we consider the maximizing switch throughput problem. Given a 
sequence tr which consists of integral packets, we present a generic technique to 
transform any c-competitive online fractional algorithm for the switch through- 
put problem into a discrete algorithm with a competitive ratio of c(l -I- 
where Elm = 'Yl'ILi \ ~ In w is the mth harmonic number. In particular, we take 
the fractional scheduling algorithm EP, which was presented in Section 4, with 
a competitive ratio of and transform it into an online discrete scheduling 

algorithm with a competitive ratio of ). Thus, the competitive 

ratio of the discrete algorithm asymptotically approaches ~ 1.58 for large 
size queues B. We start by defining the cost scheduling problem in the next 
subsection. 

5.1 The Cost Scheduling Problem 

In this model queues are unbounded and packets must be inserted by the switch- 
ing algorithm into each queue. The goal is to minimize the maximum cost, i.e. 
minimize the maximum queue size over time. 

We now turn to consider a relaxation of the model and allow an online algo- 
rithm to transmit fractional packets waiting at the head of different non-empty 
queues, if they exist, provided that the total size of the transmitted fractions 
in one unit of time does not exceed 1. We denote by A the online fractional 
algorithm. We now introduce a general technique for the discretization of the 
scheduling of algorithm A given a finite sequence of integral packets cr while 
adding an additive factor of at most Hm to the cost of A. We start with the 
following definition and lemmas. 

Definition 3. An online fractional algorithm A is work-conserving if in each 
unit of time it transmits a total size of 1 if it exists, and transmits the total load 
if not. 

Lemma 3. Every algorithm A can he transformed into a work- conserving algo- 
rithm A' while not worsening the performance of any sequence (in particular, 
CA' < CA, where ca' , ca are the competitive ratios of A' and A, respectively) . 
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Henceforth we will deal exclusively with work-conserving algorithms, even 
when we neglect to say so explicitly. Now, we define algorithm M which gets 
an algorithm H as a parameter and denote it by M^. At each time unit, in 
order to decide which queue to serve, computes the load seen by A and 
uses this information to make its decision. Given a time unit t, let if and li be 
the simulated load of A and the actual load of on queue i at that time, 
respectively. The residual load of queue i at time unit t is defined as = 
li — lf. Intuitively, algorithm transmits at each time unit from a queue with 
maximum residual load. Before we present the exact definition of algorithm 
we define the following: 

Definition 4. The mid-state of each time unit is the state of the queues after 
algorithm A transmits its fractional packets from some queues, and just before 
algorithm transmits its packet from some queue. 

Algorithm Maintain a running simulation of algorithm A. At each time 

unit transmit a packet from a queue which has the maximum residual load at 
the mid-state (breaking ties arbitrarily), unless all queues are empty. 

Theorem 4. The cost of is at most the cost of A plus Hm- 

Remark j. An alternative proof of a slightly weaker bound of [log 2 mj plus the 
cost of A can be obtained using [6], non explicitly. 

5.2 Discretization of the Fractional Scheduling 

Given a finite sequence of integral packets cr we present a general technique 
to transform any c-competitive fractional algorithm for the switch throughput 
problem into a discrete algorithm with a competitive ratio of c(l -I- In 

particular, we will apply this technique for the discretization of algorithm EP, 
which was presented in Section 4. It can easily be shown that any algorithm 
A can be transformed into a greedy admission control algorithm A' while not 
worsening the performance of any sequence (in particular, ca' < ca, where ca', 
ca are the competitive ratios of A' and A, respectively); we focus on greedy 
admission control algorithms. 

For our discretization process we rely on the results of the cost problem, 
which were presented in Subsection 5.1. We want to address the packets which 
were accepted by EP as the input sequence a for the cost problem studied in 
Subsection 5.1, in a manner which is yet to be seen. Recall that algorithm EP is a 
greedy admission control algorithm. Thus, EP might accept a fractional packets 
due to insufficient queue space. Since in the model of the cost problem we study 
the case where cr consists of integral packets we want EP to only accept packets 
integrally. Hence, we continue by considering the following problem: assume we 
are given an online c-competitive algorithm A with greedy admission control. We 
want to produce a competitive algorithm A which assigns only integral packets. 
We start by assuming that algorithm A has queues of size B + 1 (algorithm 
A maintains queues of size B); we shall get rid of this assumption later on. 
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Intuitively, A emulates the scheduling of A and accepts only integral packets. 
We start with a definition and an observation before we define A formally. S 
Definition 5. We denote algorithms that accept integral packets if there is suf- 
ficient space, as discrete greedy admission control algorithms. 

Observation 1. Every greedy admission control algorithm assigns fractions on 
a queue only when the load on that queue is strictly above B — 1. 

We now define the transformation of a given greedy admission control algo- 
rithm A with queues of size B into algorithm A with queues of size B 1 which 
only accepts integral packets. 

Algorithm A-. 

— Maintain a running simulation of A. Let k\ be the fraction size which was 
transmitted by algorithm A at time t from queue i. 

— Admission control: Perform discrete greedy admission control. 

— Scheduling: For each queue i transmit a fraction of size fc* . 

Let l\ and l\ be the loads of queue z in a given time unit t in A and A, respectively. 
For the algorithm to be well defined, i.e. ll > after the arrival phase, we must 
prove the next lemma. 

Lemma 4. For each queue i and time unit t, after the arrival phase l\>l\. 
Corollary 2. Algorithm A can always transmit k\ as defined. 

Theorem 5. For a given algorithm A with queues of size B, algorithm A with 
queues of size B 1 has the same throughput given the same sequence a. 

Proof. From Corollary 2, at each time unit t, A transmits the same total size as 
algorithm A. 

Recall that EP is not a work-conserving algorithm. Therefore EP which 
emulates the scheduling of EP is also not a work-conserving algorithm. Since we 
want to use some of the results from the cost problem in Subsection 5.1, we first 
need to transform algorithm EP into a work-conserving algorithm EP'. We use 
the transformation presented by Lemma 3 in Subsection 5.1. 

Lemma 5. Every non work- conserving algorithm A can be transformed into a 
work-conserving algorithm A' which accepts only packets which are accepted by 
A, while not worsening the performance of any sequence (in particular, ca' < Ca, 
where ca> , ca are the competitive ratios of A' and A, respectively). 

Now, we consider Algorithm M which was presented in Section 5.1. We as- 
sume that M has queues of size i? -I- 1 -I- [Hm\ . Recall that the precondition of M 
was that the parameter algorithm is work-conserving. Hence, we apply algorithm 
M on algorithm EP' and denote the resulting algorithm as . The sequence 

of packets which is given to will consist solely of the packets which were 

accepted by EP'. Note that this is a sequence which consists exclusively of in- 
tegral packets. We emphasize that is a discrete algorithm, since it only 

transmits integral packets. We now present the following lemma. 
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Lemma 6. Algorithm with queues of size -B -I- 1 + [i?mj has the same 

throughput as algorithm EP' with queues of size B + 1, given the same sequence. 

We now return to our original model where queues are of size B. By Lemma 6, 
if had queues of size B + \_Hm\ + 1, then algorithms and algorithm 

EP' would have had equal throughput. Unfortunately, this is not the case, so we 
continue by emulating an algorithm with large queues with an algorithm of small 
queues. Specifically, assume we are given an online competitive discrete algorithm 
A with queues of size y and we want to produce a competitive algorithm E^ 
with queues y' < y. Intuitively, E^ tries to emulate the schedule of algorithm 
A and accept only packets which A accepts as long as the queue is not full. We 
emphasize that if algorithm A was a fractional algorithm this problem could 
easily be solved by scaling the accepted volume and the transmitted volume of 
A by y' jy. Thus we continue by considering only discrete algorithms. 
Algorithm E^\ 

— Maintain a running simulation of algorithm A. 

— Admission control: accept a packet if A accepts it and the queue is not 
full. 

— Scheduling: transmit packets as A if the queue is not empty. 

We show that the competitiveness of E^ is the competitive ratio of A times the 
ratio y/y' > 1. 

Theorem 6. Let A maintain queues of size y and let E^ maintain queues of 
size y' such that y' < y. Then A{a) < ^E'^{(j). 

We prove the main result of this paper with the next theorem. 

Theorem 7. The competitive ratio of algorithm E^ for the maximizing 
switch throughput problem is ). 

Corollary 3. For B ^ Hm the competitive ratio of E^ approaches — ^ ~ 
1.58. 

6 Discussion and Open Problems 

— Albers and Schmidt [2] show a lower bound of for m ^ B. They also 

show an algorithm with competitiveness ^ ~ 1.89 for every B > 1. Our main 
result in this paper shows that for the case were B > log m our algorithm 
can do much better than 1.89. Our algorithm approaches a competitive ratio 
of which interestingly is the lower bound for m ^ B. Azar and Richter 
[4] show a lower bound of 1.366 ~ for any B and m, which is smaller 

than the lower bound of -Ur- Therefore, there is still work to be done to 
determine the optimal competitive ratio for arbitrary B and m. 

— It is an interesting question whether greedy algorithms which transmits from 
the longest queue first has a competitive ratio smaller than 2, for B ^ m. 
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— For B ^ logm our general technique shows that the competitive ratio of 
the discrete version algorithm approaches the performance of a fractional 
algorithm. Hence, it is interesting to determine the best fractional algorithm 
for large sized queues. 
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Abstract. The problem of maximizing the weighted throughput in vari- 
ous switching settings has been intensively studied recently through com- 
petitive analysis. To date, the most general model that has been inves- 
tigated is the standard CIOQ (Combined Input and Output Queued) 
switch architecture with internal fabric speedup S' > 1. CIOQ switches, 
that comprise the backbone of packet routing networks, are N x N 
switches controlled by a switching policy that incorporates two compo- 
nents: admission control and scheduling. An admission control strategy 
is essential to determine the packets stored in the FIFO queues in input 
and output ports, while the scheduling policy conducts the transfer of 
packets through the internal fabric, from input ports to output ports. The 
online problem of maximizing the total weighted throughput of CIOQ 
switches was recently investigated by Kesselman and Rosen in [12]. They 
presented two different online algorithms for the general problem that 
achieve non-constant competitive ratios (linear in either the speedup or 
the number of distinct values or logarithmic in the value range). We in- 
troduce the first constant-competitive algorithm for the general case of 
the problem, with arbitrary speedup and packet values. Specifically, our 
algorithm is 9.47-competitive, and is also simple and easy to implement. 



1 Introduction 

Overview: Recently, packet routing networks have become the dominant plat- 
form for data transfer. The backbone of such networks is composed of V x TV 
switches, that accept packets through multiple incoming connections and route 
them through multiple outgoing connections. As network traffic continuously 
increases and traffic patterns constantly change, switches routinely have to ef- 
ficiently cope with overloaded traffic, and are forced to discard packets due to 
insufficient buffer space, while attempting to forward the more valuable packets 
to their destinations. 

Traditionally, the performance of queuing systems has been studied within 
the stability analysis framework, either by a probabilistic model for packet injec- 
tion (queuing theory, see e.g. [7,14]) or an adversarial model (adversarial queuing 
theory, see e.g. [4,8]). In stability analysis packets are assumed to be identical, 
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and the goal is to determine queue sizes such that no packet is ever dropped. 
However, real-world networks do not usually conform with the above assump- 
tions, and it seems inevitable to drop packets in order to maintain efficiency. 
As a result, the competitive analysis framework, which avoids any assumptions 
on the input sequence and compares the performance of online algorithms to 
the optimal solution, has been adopted recently for studying throughput max- 
imization problems. Initially, online algorithms for single-queue switches were 
studied in various settings [1,3,10,11,13]. Later on, switches with multiple input 
queues were investigated [2,5,6], as well as CIOQ switches with multiple input 
and output queues [12]. 

To date, the most general switching model that has been studied using com- 
petitive analysis is CIOQ (Combined Input and Output Queued) switching ar- 
chitecture. A CIOQ switch with speedup S' > 1 is an iV x iV switch, with N 
input ports and N output ports. The internal fabric that connects the input and 
output FIFO queues is S times faster than the queues. A switching policy for a 
CIOQ switch consists of two components. First, an admission control policy to 
determine the packets stored in the bounded-capacity queues. Second, a schedul- 
ing strategy to decide which packets are transferred from input ports to output 
ports through the intermediate fabric at each time step. The goal is to maximize 
the total value of packets transmitted from the switch. The online problem of 
maximizing the total throughput of a CIOQ switch was studied by Kesselman 
and Rosen in [12]. For the special case of unit- value packets (all packets have 
the same value) they presented a greedy algorithm that is 2-competitive for a 
speedup of 1 and 3-competitive for any speedup. For the general case of packets 
with arbitrary values two different algorithms were presented, by using tech- 
niques similar to those introduced in [5]. The first algorithm is 45'-competitive 
and the second one is (8 min{/c, 2 [log a \ })-competitive, where k is the number of 
distinct packet values and a is the ratio between the largest and smallest values. 

Our results: We present the first constant-competitive algorithm for the general 
case of the problem with arbitrary packet values and any speedup. Specifically, 
our algorithm is 9.47-competitive and is also simple and easy to implement. Our 
analysis includes a new method to map discarded packets to transmitted packets 
with the aid of generating dummy packets. We note that our algorithm and its 
analysis apply almost as-is to less restrictive (and less common) models of CIOQ 
switches, such as switches with priority queues rather than FIFO queues. 

Other related results: The online problem of throughput maximization in 
switches has been explored extensively during recent years. Initially, single-queue 
switches were investigated, for both arbitrary packet sequences, and 2-value se- 
quences. The preemptive model, in which packets stored in the queue can be dis- 
carded, was studied in [10,11,13]. The non-preemptive model, in which a packet 
can be discarded only upon arrival, was initially studied by Aiello et al. [1], 
followed by Andelman et al. [3] who showed tight bounds. These results for a 
single queue were generalized for switches with arbitrary number of input queues 
in [5], by a general reduction from the multi-queue model to the single-queue 
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model. Specifically, a 4-competitive algorithm is presented in [5] for the weighted 
multi-queue switch problem. An improved 3-competitive algorithm was recently 
shown in [6]. The multi-queue switch model has been also investigated for the 
special case of unit- value packets, which corresponds to IP networks. Albers and 
Schmidt [2] recently introduced a deterministic 1.89-competitive algorithm for 
the unit- value problem; A randomized 1.58-competitive algorithm was previ- 
ously shown in [5]. An alternative model to the multiple input queues model is 
the shared memory switch, in which memory is shared among all queues. Hahne 
et al. [9] studied buffer management policies in this model while focusing on 
deriving upper and lower bounds for the natural Longest Queue Drop policy. 



2 Problem Definition and Notations 

In the online CIOQ (Combined Input and Output Queued) switch problem, 
originally introduced in [12], we are given an N x N switch with N input ports 
and N output ports (see figure 1). Input port i {i = 1,... ,iV) contains N 
virtual output FIFO queues (referred to as input queues henceforth), denoted 
{VOij}^^i, with bounded capacities, where queue VOij is used to buffer the 
packets arriving at input port i and destined to output port j. Output port 
j {j = 1, . . . , N) contains a bounded capacity FIFO queue (also referred to 
as output queue), denoted Oj, to buffer the packets waiting to be transmitted 
through output port j. For each queue Q in the system, we denote by B{Q) the 
capacity of the queue, and by Q{t) the ordered set of packets stored in the queue 
at time step t. We denote by h{Q{t)) and min(Q(t)) the packet at the head of 
queue Q and the minimal value, respectively. 




Fig. 1. CIOQ switch - an example 



Time proceeds in discrete time steps. We divide each time step into three 
phases. The first phase is the arrival phase, during which packets arrive online 
at the input ports. All packets have equal size, and each packet is labelled with a 
value, an input port, through which it enters the switch, and a destination output 
port, through which is has to leave the switch. We denote the finite sequence of 
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online arriving packets by a. For every packet p € a we denote by v{p), in{p) 
and out{p) its value, input port and output port, respectively. Online arriving 
packets can be enqueued in the input queue that corresponds to their destination 
port, without exceeding its capacity. Remaining packets must be discarded. We 
allow preemption, i.e. packets previously stored in the queues may be discarded 
in order to free space for newly arrived packets. The second phase at each time 
step is the scheduling phase during which packets are transferred from the input 
queues to the output queues. For a switch with speedup S' > 1 at most S packets 
can be removed from each input port and at most S packets can be enqueued 
into the queue of each output port. This is done in S consecutive rounds, during 
each we compute a matching between input and output ports. We denote round 
s (s = 1, . . . , S) at time step t hy tg- During the scheduling phase the capacity 
of the output queues may not be exceeded, and again preemption is allowed. 
The third phase at each time step is the transmission phase. During this phase 
a packet may be transmitted from the head of each output queue. 

A switching algorithm A for the CIOQ switch problem is composed of two 
components, not necessarily independent. The first component is admission con- 
trol; Algorithm A has to exercise an admission control strategy in all queues 
(input and output), that decides which packets are admitted into the queues 
and which packets are discarded in case of overflow. The second component 
is a scheduling strategy; During scheduling round tg algorithm A first decides 
the set of eligible packets for transfer E^{tg) C {h{VOij{tg)) : (f,j) € [A^]^}, 
i.e. a subset of the packets currently located at the head of the input queues. 
The set E^{tg) can be translated into the bipartite weighted graph G^{tg) = 
{[N], such that E = {(m(p), out(p)) : p S E^{tg)}, where the weights 

on the edges correspond to the packet values, namely w{e) = v{p) for e = 
(in{p) , out{p)) € E. Algorithm A then constructs a matching M-^{tg) C G-^{tg) 
that corresponds of the set of packets that are transferred from input queues 
to output queues at round tg. Given a finite sequence a we denote by A{a) the 
set of packets algorithm A transmitted from the switch. The goal is to max- 
imize the total value of transmitted packets, denoted R(A(<t)), i.e. maximize 
V(A{a)) = J2p^A(cr)'^(p)- online algorithm A is said to be c-eompetitive if 
and only if for every packet sequence cr, R(Opt(cr)) < c • V{A{a)) holds, where 
Opt denotes the optimal off-line algorithm for the problem. 



3 The Algorithm 

In this section we define and analyze our algorithm for the CIOQ switch problem. 
We begin with a definition of a parameterized preemptive admission control 
policy GR /3 for a single queue (figure 2). This policy is a generalization of the 
ordinary greedy policy from [10], that is obtained by setting /3 = 1. The latter 
will be denoted by GR. We then present our parameterized algorithm SG(/3) 
(abbreviation for Semi-Greedy(/3)) for the problem (figure 3). For simplicity 
of notation, we drop the parameter (3 for the remainder of this section, and use 
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Algorithm GR/j [Single-Queue] 

Enqueue a new packet p if: 

— |Q(t)| < B{Q) (the queue is not full). 

— Or v(p) > j3 ■ min(Q(t)). In this case the smallest packet is discarded and p is 
enqueued. 



Fig. 2. Algorithm GR/ 3 . 



Algorithm SG(/3) [CIOQ switch] 

1. Admission control: 

a) Input queues: use algorithm GR. 

b) Output queues: use algorithm GR/j. 

2. Scheduling: at scheduling round ts do: 

a) Set = 

{h{VOij{ta)) : |Oi(ts)| < B{Oj) or v(h{V Oij{ts))) > /3 • min( 03 (ts))} 

b) Compute a maximum weighted matching C G^^{ts). 



Fig. 3. Algorithm SG(/3). 



the abbreviated notation SG for the algorithm. The value of the parameter f3 
will be determined later. 

Algorithm SG exercises the greedy preemptive admission control policy GR 
on all input queues, and the modified admission control policy GR/j on all output 
queues. At each scheduling round SG considers only those packets that would 
be admitted to their destined output queue if transferred, namely packets whose 
destined output queue is either not full, or they have a value greater than f3 
times the currently smallest packet in this queue. Algorithm SG then computes 
a maximum weighted matching in the corresponding bipartite graph. Note that 
according to SG operation, packets can be discarded from output queues during 
each scheduling round. We now proceed to prove the competitive ratio of SG. 

Theorem 1. Algorithm SG achieves constant competitive ratio. Specifieally, for 
an optimal choice of the parameter /3, algorithm SG is 9. A7 -competitive. 

Proof. The outline of the proof is as follows. Given an input sequence a, we wish 
to construct a global weight function Wa ■ SG(cr) — >■ R, that maps each packet 
transmitted by SG from the switch to a real value. In order to achieve a constant 
competitive ratio it is sufficient to prove the following: 

SpeOpt(cr)\SG(cr) ^{p) — Sp£SG(cr) 

2. Wcr{p) < c ■ v{p), for each packet p € SG((t), and for some global constant c. 

Note that by the first condition the total weight that is mapped to packets which 
were transmitted by SG covers the total value lost by SG compared with Opt. 
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By the second condition this is done by charging each packet with a weight that 
exceeds its own by no more than a constant factor. In the following we show how 
to construct Wa. 

Step 1: The basic mapping scheme. We begin by introducing some additional 
notations. Consider the single-queue admission control policy GR^, and let A 
denote any single-queue admission control policy that is exercised by Opt. As- 
sume that GR^ and A do not operate under the same conditions, specifically, let 
a I and 02 be the input packet sequences given to A and GR/j, and let T\ and 
T 2 denote the time steps at which A and GR,g, respectively, transmit from the 
queue. We denote by = {p g cti fl (T 2 : p € A(cri)\GRp((T 2 )} the set of 

packets in the intersection of the sequences that were transmitted by A and not 
by GR/ 3 . The following online-constructed local mapping will serve as the basic 
building block in the construction of our global weight function Wa- 

Lemma 1. Let A and GRp (for (3 > 1) be admission control policies for a 
single queue, with input packet sequences ui, CT 2 cind transmission times T\, T 2 
respectively. If T\ C T 2 then there exists a mapping Ml : — >■ GR/ 3 (fT 2 ) 

such that the following properties hold: 

1. v{p) < [3 ■ r;(ML(p)), for every packet p g . 

2. Every packet in GRp(tT 2 )\A((Ti) is mapped to at most twice. 

3. Every packet in GR/ 3 (ct 2 ) fl A(cti) is mapped to at most once. 

Proof. For simplicity of notation, we prove the lemma for GR, i.e. GRi. The 
lemma will then follow directly for the general case, using the same construction, 
since for GRp the ratio between a discarded packet and any packet in the queue 
is at most (3. We construct the mapping Ml on-line following the operation of 
the admission control policies. For the remainder of the proof we denote the 
contents of the queue according to GR operation (respectively A operation) by 
Qgr (respectively by Qa)- We further denote by Ml~^(p) the packets that are 
mapped to packet p (note that |Ml~^(p)| < 2 for every packet p). Given a packet 
P € Qgr(^) we denote by potential{p) the maximum number of packets that can 
be mapped to p according to the properties of the lemma, namely potential{p) = 
1 for p g A{<J\), otherwise potential{p) = 2. We further denote by availabletfp) 
the number of available mappings to p at time t, namely potential{p) minus the 
number of packets already mapped to p until time t. A packet p g QcRit) is called 
fully-mapped if no additional packet can be mapped to it, i.e. if availablet{p) = 0. 
The definition of the mapping Ml appears in figure 4. 

We begin by observing that the mapping Ml indeed adheres to the required 
properties of the lemma. 

Claim. The mapping Ml meets properties 1-3 of Lemma 1 

Proof. By induction on the time steps. First, observe that a packet is mapped 
to only if it is not fully-mapped, therefore properties 2-3 are satisfied. Second, 
by the definition of GR, when a packet p is discarded from the queue all other 
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Mapping Ml ; — >■ GR((T 2 ) 

1. Let p £ arrive at time t\ Map p to the packet closest to the head in 

queue Qgr that is not fully-mapped. 

2. Let q G Qgr(I) be a packet that is discarded from queue Qgr during time t. 
Update Ml as follows: 

a) If g £ A{(7i) map it to the packet closest to the head that is not fully- 
mapped. 

b) Do the same with respect to ML~^(g) (if exists). 



Fig. 4. Local mapping Ml. 

packets residing in the queue have higher values. By the induction hypothesis, 
this is also true with respect to Ml~^(p). Therefore, all mappings concur with 
property 1. This completes the proof. □ 

We now turn to characterize the packets in queue Qgr whenever a mapped 
packet resides in the queue. 

Claim. Let p £ QcR(t) be a mapped packet in queue Qgr at time step t. Then 
the following holds: 

1. h{QQR{t)) (the packet at the head) is mapped. 

2. For every packet q £ Ml~^(p) we have: v{q) < v{h{Qc,R{t))) 

Proof. If packet p is at the head of queue Qgr at time t then we are clearly 
done by Claim 3. Otherwise, let t' < t he the time step at which packet p was 
mapped to for the last time. By the definition of Ml all the packets closer to 
the head than p were already fully-mapped at time t' . Furthermore, all those 
packets have higher values than the value of the packet (or packets) mapped to 
p at that time, according to the definition of GR. Since FIFO order is obtained, 
these properties continue to hold at time t, and the claim follows. □ 

To complete the proof of Lemma 1 we need to prove that the definition of 
Ml is valid, i.e. that indeed whenever a packet is discarded, queue Qgr contains 
packets that are not fully-mapped. 

Claim. The definition of the mapping Ml is valid, namely, at each time step the 
queue Qgr contains unmapped packets as assumed in the definition of Ml. 

Proof. We define a potential function <P{f) := X)pGQGR(t) that 

counts the number of available mappings at every time step. We prove the claim 
by induction on the changes that occur in the system, i.e. packet arrivals and 
transmissions. Specifically, the correctness of the claim follows from the invariant 
inequality <P{t) + |Q^(t)| > |QGR(t)| that is shown to hold at each time step by 
induction. For the initial state the inequality clearly holds. We next denote by 
A^, zi _4 and Z\gr the changes that occur in the values of |Q^(t)|, |QGR(t)|, 
respectively. We examine all possible changes in the system, and show that the 
inequality continues to hold, either by proving it directly or by showing that 



72 



Y. Azar and Y. Richter 



+ Z\_4 > ZicR- For the remainder of the proof we assume w.l.o.g that A does 
not preempt packets. 

1. Packet p S (7i n CT 2 arrives: We examine the three possible cases: 

a) p is accepted by GR while no packet is discarded: Clearly, > 
Agr- 

b) p is rejected by GR: If p is also rejected by A nothing changes. Other- 
wise, queue Qy\ was not full before the arrival of p as opposed to queue 
Qgr- Hence by the induction hypothesis before the packet arrival ^ > 0. 
After we map a packet to p, A^ -I- Aj\ = 0 and the inequality holds. 

c) p is accepted by GR while a packet is discarded: Clearly, Z\gr = 0 
since a packet is discarded only when queue Qgr is full- Note that in 
either case (p is accepted or rejected by A) A^, + Z\^ > 0 since the left- 
hand side of the inequality first increases by two and then decreases by 
at most two (case 2 in Ml definition). Therefore, the inequality continues 
to hold. 

2. Packet p € cr 2 \cri arrives: A,p > ziqR whether p is accepted by GR or not. 

3. Packet p G <Ji\a 2 arrives: Z \_4 > 0, while other values remain unchanged. 

4. Transmission step t G Tinr 2 : If queue Qgr does not hold mapped packets, 
the inequality trivially holds, since in this case ^(t) > |QGR(f)|- Otherwise, 
by Claim 3 the packet at the head of the queue is a mapped packet. If the 
packet at the head of the queue is a fully-mapped packet then after the 
transmission takes place = Agr = — 1 while A^ = 0 and the inequality 
holds; Otherwise, by the definition of Ml, there are no additional mapped 
packets in queue Qgr and we are back to the previous case. 

5. Transmission step t G T 2 \Ti: If the packet at the head of queue Qgr is 
fully-mapped then after the transmission takes place Agr = —1 while A,j = 
A _4 = 0 and the inequality holds. Otherwise, after the transmission there 
are no mapped packets in the queue and, again, by the same considerations 
given in the previous case, the inequality goes on. 

□ 

This completes the proof of Lemma 1. □ 

Step 2: Defining the global weight function - phase 1. Note that algorithm SG 
can lose packets by one of two ways. Packets can be discarded at the input 
queues, and packets stored in the output queues can be preempted in favor of 
higher value packets. Ideally, we would like to proceed as follows. Apply the local 
mapping Ml to each input queue to account for packets discarded there. Then, 
whenever a packet p preempts a packet q at an output queue, map q along with 
the packets mapped to it to packet p. Now define the global weight function such 
that Wa {p) equals the total value of packets mapped to p. However, the required 
condition for the construction of the mapping Ml (see Lemma 1) is not met; 
We have no guarantee that Ti C T 2 , namely that SG transmits from each input 
queue whenever Opt does. As we shall see this obstacle can be overcome with 
some extra cost. 
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Creating dummy packets 

1. For scheduling round ts define: 

S{t,) := contains a mapped packet} 

Slit,) := {(ij) € Sits) : ii,j) e M°^\ts) A (i,j) £ G^^its)\M^^its)} 
S2its) := {iij) G Sits) : ii,j) £ M°^\t,) A (i,j) 0 G^^it,)} 

2. For each (i, j) £ 5i(ts) U S 2 it,) do: 

a) Let p be the mapped packet closest to the tail in input queue V Oij at time 
ts, and let q £ Mr~^(p). 

b) Create dummy packet dijit,) such that u(dij its)) =viq). 

c) Set Ml( 5 ) = 0. 



Fig. 5. Creating dummy packets. 



To fix the problem described in the previous paragraph we first rectify the 
definition of Ml- Looking closer into case 4 in the proof of Claim 3 it should be 
obvious that the condition in Lemma 1, namely Ti C T 2 , can be slightly relaxed. 
In fact, in order to maintain the invariant inequality defined in the proof of 
Claim 3, it is sufficient to remove a single mapping from one of the mapped 
packets in the queue whenever Opt transmits a packet from the queue while SG 
does not and the queue contains mapped packets. With that in mind, we solve 
the problem by creating a set of dummy packets (see figure 5) to handle each 
such instance. As a result, our Ml mapping for the input queues is now valid. 
We can now define an initial global weight function by w^ip) = 
cover the total value lost by SG compared to Opt at the input queues, excluding 
the value of the dummy packets. However, we just shifted the problem further. 
For the analysis to work, we need to prove that the value of the dummy packets 
can be absorbed in the system, i.e. it can be assigned to real packets. This is 
done in the following step. 

Step 3: Defining the global weight function - phase 2. The first step is to account 
for all dummy packets dijits) £ iS'i(ts). From now on these dummy packets will 
be referred to as dummy packets of the first kind, while dummy packets created 
due to S 2 its) will be referred to as dummy packets of the second kind. Recall that 
Slits) C G^^its), and that S'i(fs) forms a matching whose value is lower than 
the value of the maximum weighted matching M^'^its). We modify the weight 
function as follows. Let p £ be a packet scheduled by SG at time step 

ts- Increase w^rip) by vip). We thus absorbed the value of dummy packets of the 
first kind. Therefore, we may continue the analysis while ignoring these dummy 
packets. 

We are now left only with the dummy packets of the second kind, i.e. packets 
of the form dijit,) £ S' 2 (ts). Denote by aj the sequence of packets scheduled 
by SG to output queue j throughout the operation of the algorithm. Further 
denote by dijits) all the dummy packets of the second 
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kind that are destined to output queue Oj . Now recall that for every scheduling 
round tg, S 2 {ts) H G^'^{ts) = 0. By the definition of algorithm SG this means 
that v{h(yOij(ts))) < [3 ■ min(Oj(ts)). By Claim 3 it follows that v{dij(ts)) < 
P ■min{Oj(ts)). Therefore, dummy packet dij{ts) would have been rejected from 
queue Oj had it been sent to it. We can now apply our mapping scheme Ml to 
queue Oj with arrival sequence (jjU in order to map all dummy packets in 

^dummy packets in the output queue Oj . Looking closer into the definition 

of Ml, this can be done while mapping each real packet in the output queue at 
most once. We then modify our global weight function by increasing the weight 
assigned to packet p by v{q), where q = Ml~^(p) is a dummy packet of the 
second kind. Since we have now accounted for all dummy packets of the second 
kind, we may consider them as absorbed in the system and ignore them. 

Step 4- Defining the global weight function - conclusion and recap. In order to 
complete the definition of the weight function Wa- we should define the modifica- 
tions in case of packet preemption at the output queues. Let packet p preempt 
packet q at an output queue. We increase the weight assigned top by v{q)+Wa{q), 
to account for packet q and the weight assigned to it. Figure 6 summarizes the 
definition of the constructed global weight function w^. 



Weight function Wa '■ SG(cr) — > R 

1. Apply mapping scheme Ml to each input queue. Add dummy packets to ensure 
proper operation of Ml. 

2. Define Wa{p) = X^ 5 gML-i(p) cover total lost value at input queues, 

excluding the dummy packets. 

3. Map the value of dummy packets of the first kind created at time tg to real 
packets scheduled at that time. 

4. Use mapping scheme Ml to map dummy packets of the second kind to real 
packets in output queues. Modify Wa accordingly. 

5. Whenever packet p preempts packet q at an output queue, modify w^{p) — 
Wa{p) + Wa{q) -t v{q). 



Fig. 6. Construction of the global weight function Wa - summary. 

So far we have shown how to account for all lost packets through the use of 
the weight function. It remains to prove that the weight assigned to each packet 
is bounded. 

Claim. By setting P = 1 + \/5, the following holds: 

1. Wcr{p) < 9.47 • f(p), for each packet p € SG(cr)\Opt((T). 

2. Wcr{p) < 8.47 • f (p), for each packet p € SG(cr) fl Opt(cr). 

Proof. Consider a packet p € SG(CT)\Opt(cr). Clearly, according to our analysis, 
packet p can be assigned with a weight of at most 3v{p) before it reaches its 
destined output queue (weight of 2v{p) while it resides in the input queue, and 
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an additional weight of v{p) to account for dummy packets of the first kind 
when it is scheduled to be transferred to the output queue). If p preempts a 
packet q once it arrives at the output queue, then its assigned weight increases 
by w{q) + v{q) (note that in this case v{p) > (3 ■ v{q)). Furthermore, once in 
the output queue, p can be assigned with the value of a dummy packet of the 
second kind, which is bounded by j3 ■ v{p). In order to ensure a steady state 
in the system, we require that the total weight ever assigned to a packet p 
does not exceed c • v{p). This condition corresponds to the following inequality: 
3 + /3 + < c. Optimizing over (3, we conclude that the optimum is obtained 

for (3=1 + \/5 and c < 9.47. Note that a packet p e SG((t) 0 Opt(a) can be 
assigned with a weight of at most v{p) while is resides in the input queue (rather 
than 2v{p) in the previous case), therefore its total assigned weight is at most 
8.47 • v{p). □ 

We conclude that: 

I^(Opt(<T)) = 14(0pt(cr) n SG(cr)) + I^(Opt(cr)\(SG(cr))) 

< y(Opt(cr) nSG(CT)) 

+8.47 • I7(0pt((T) n SG(ct)) + 9.47 • I4(SG(cr)\0pt((T)) 

= 9.47- I4(SG(cr)). 

This completes the proof of Theorem 1. □ 
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Abstract. Given n distinct points pi , p 2 , ■ • ■ ,Pn in the plane, the map label- 
ing problem with four squares is to place n axis-parallel equi-sized squares 
Q\, . . . , Qn of maximum possible size such that pi is a corner of Qi and no 
two squares overlap. This problem is NP-hard and no algorithm with approxima- 
tion ratio better than | exists unless P = VP [10]. 

In this paper, we consider a scenario where we want to visualize the information 
gathered hy smart dust, i.e. by a large set of simple devices, each consisting of 
a sensor and a sender that can gather sensor data and send it to a central station. 
Our task is to label (the positions of) these sensors in a way described by the 
labeling problem above. Since these devices are not positioned accurately (for 
example, they might be dropped from an airplane), this gives rise to consider the 
map labeling problem under the assumption, that the positions of the points are 
not fixed precisely, but perturbed by random noise. In other words, we consider 
the smoothed complexity of the map labeling problem. We present an algorithm 
that, under such an assumption and Gaussian random noise with sufficiently large 
variance, has linear smoothed complexity. 



1 Introduction 

In a few years it will be possible to build autonomous devices with integrated sensors, 
communication and computing units, and power supply whose size is less than a cubic 
millimeter. A collection of these miniature sensor devices is commonly referred to as 
smart dust. One application of smart dust is the exploration of contaminated terrain, 
e.g. after a nuclear catastrophe, a firestorm, outbreak of a disease, etc. The sensors 
are dropped over the contaminated terrain from an aircraft. On the ground they gather 
information and use their communication units to build a network. Through this network, 
the gathered information is communicated to some central processing unit outside the 
contaminated terrain. The processing unit evaluates the incoming data. In this evaluation 
process, it may be very helpful to maintain a virtual map of the contaminated terrain and 
the sensors. In this map we want to display the value currently measured by each sensor. 

* Research is partially supported by DFG grant 872/8-2 and by the EU within the 6th Framework 
Programme under contract 001907 “Dynamically Evolving, Large Scale Information Systems" 
(DELIS). 
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Therefore, we want to assign to each sensor some label where the value is displayed. This 
label should be large, close to the sensor’s position, and it should not intersect with other 
labels. These kind of labeling problems have been considered before in the context of 
map labeling [23]. We consider a basic map labeling problem, first studied by Formann 
and Wagner [10]. This problem is known to be iVP-hard and there is no polynomial time 
approximation algorithm with an approximation factor of ^ + e for any e > 0 unless 
P equals NP. In this paper, we consider this labeling problem in the context of smart 
dust, i.e. we take into consideration that we do not have a worst case distribution. 

Typically, smart dust has been modeled by n points (the sensors) distributed uni- 
formly in the unit square. In this paper we consider a refined model which captures 
more general distributions. Our refined model will be called fhe drop point model. The 
following observation forms the basis of the drop-point model. If we drop a single sensor 
from an airplane, we can calculate its landing point under ideal conditions (i.e., we know 
the exact speed of the aircraft, there is no wind, etc.). However, in real conditions, the 
true landing point will be somewhere near the ideal landing point. We follow standard 
assumptions and assume that, in general, the position of the true landing point is given 
by a Gaussian distribution around its ideal landing point. Of course, the idea of smart 
dust is to drop many sensors, not just a single one. But the airplane must not drop all 
sensors at once. A simple electronic device could be used to drop sensors at specific 
points (the drop-points). We identify with the drop points the ideal landing points of the 
sensors (and not the coordinates where they are dropped). Thus in our model, we have 
that each sensor (modeled by a point) is distributed according to a Gaussian distribu- 
tion with variance around its drop-point; in other words we consider the smoothed 
complexity of the map labeling problem. Since drop-points may be set arbitrarily, we are 
interested in the worst case expected complexity under this distribution where the worst 
case is taken over the distributions of drop points and the expectation is taken over the 
Gaussian distribution around the drop points. 

We show that the complexity of the labeling problem in the drop point model is 
0(n), if tr^ > 77 ,-i/ 2 tH-e fQj- arbitrary small constant e* > 0. As a byproduct of this 
work, we also show that we have linear running time in the average case. 

1.1 Related Work 

Map labeling is a fundamental problem in automated cartography concerned with the 
placement of labels near certain specified sites on a map satisfying certain constraints. 
Technical maps, where one is required to display some data or symbol near every 
point have become increasingly important in a variety of applications like display of 
radio/sensor networks, geographical information systems, etc. Among the various prob- 
lems arising in map labeling, problems related to labeling a set of points in the plane 
have been extensively studied in recent years. Formann and Wagner [10] considered the 
basic map labeling problem where given n distinct points pi,p2, ■ ■ • ,Pn iti the plane, 
the objective is to place n axis-parallel equi-sized squares Qi , Q 2 , ■ • ■ , Qn of maximum 
possible size such thatp^ is a corner of Qi and no two squares overlap. Using a reduction 
from 3SAT, they proved the corresponding decision problem to be A^P-complete and 
presented an approximation algorithm which finds a valid labeling of the points with 
squares of size at least half the optimal size. Moreover, they showed that there is no 
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hope to find an algorithm with a better approximation ratio unless P equals NP. In 
order to solve the problem exactly on small sized point sets, Kucera et. al. [14] presented 
two sub-exponential time exact algorithms for the decision version of the map labeling 
problem. They also observed that the optimization version can be reduced to solving 
0(log n) decision problems using binary search on all inter-point distances. One of 
their algorithms has running time and is based on using a variant of the planar 

separator theorem. It was observed by Wagner and Wolff [22] that for most inputs, the 
approximation algorithm found labelings of size close to half times the optimal labeling 
size. They also developed several heuristics which were applicable to large point sets. 
Although these heuristics performed very well on most datasets and had almost linear 
running time, there was no theoretical justification for their performance. 

Smoothed analysis was introduced by Spielman and Teng [21] as a hybrid between 
the worst-case and average-case analysis of algorithms to explain the polynomial running 
time of the simplex algorithm on practical inputs. They showed that the simplex algorithm 
with the shadow vertex pivot rule has polynomial time smoothed complexity. This work 
sparked a whole new line of research towards the smoothed analysis of algorithms [3,6]. 
The smoothed complexity of an algorithm is defined as the worst case expected running 
time of the algorithm over inputs in the neighborhood of a given instance, where the 
worst case is over all instances. Smoothed analysis helps to distinguish whether the 
space of problem instances contains only isolated hard cases or dense regions of hard 
instances. Smoothed competitive analysis of online algorithms [4] and smoothed motion 
complexity have been introduced [7]. Very recently, Beier and Voecking[5] have given a 
probabilistic analysis for a large class of NP-hard optimization problems, which enables 
a nice characterization of the average case and smoothed complexity of a problem in 
terms of the worst-case complexity. 

1.2 Geometric Graph Model 

Given an upper bound r on the optimal labeling size for an instance of the map labeling 
problem, we consider a geometric conflict graph Gp{2r) = (V, E) where the n points 
P (in the unit square) form the set of vertices V of the graph, and (p, q) G E iff 
lb~ 7 II 00 — This conflict graph represents all possible conflicts that can occur 
between labels of size at most r of different points. The random geometric conflict graph 
is similar to the random geometric graph model G{n, r) = (V, E) where the n nodes are 
distributed uniformly in [0 , 1] and there is an edge between two points if their Euclidean 
distance is at most r. Random geometric graphs model the spatial and connectivity 
constraints in various problems, and their properties such as connectivity, diameter, etc 
have been studied in the context of applications such as distributed wireless networks and 
sensor networks. [1 1,12,16]. There is also a lot of work on finding thresholds for various 
properties of random geometric graphs and the average case complexity of problems for 
n points distributed uniformly and independently in the unit square[l,2,8,18,19]. 

1.3 Our Contribution 

We introduce the drop-point model for smart dust. The following observation forms the 
basis of our model. If we drop a single sensor from an airplane we can calculate its 
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landing point under ideal conditions (i.e., we know the exact speed of the aircraft, there 
is no wind, etc.)- However, in real life we do not have ideal conditions. This means 
that the true landing point will be somewhere near the ideal landing point. We follow 
standard assumptions and assume that, in general, the position of the true landing point 
is given by a Gaussian distribution around its ideal landing point. So we can describe the 
position Pi of the iih point as Pi= Pi + Ci (cr) where pi is the drop point of the ith point 
and ei(cr) is a two dimensional vector drawn from the Gaussian distribution with (one 
dimensional) variance centered at the origin. ' This means that the distribution P of 
our input point set can be described as a pair (P, cr) where P is the set of drop points P 
and cr^ is the variance of the corresponding one dimensional Gaussian distribution. For a 
given number n we are interested in the worst case expected complexity achieved by any 
distribution with | P | = n, where the worst case is taken over the choice of the set of drop 
points. We also call this the smoothed complexity of the problem under consideration. 

In the remainder of this paper we present an exact algorithm for the map labeling 
problem with four squares introduced in [] (see next section for a definition). We show 
that for a wide variety of distributions for the input set of points, our algorithm has linear 
running time. These distributions include the uniform distribution in the unit cube and 
the drop point model with variance > 71 - 1 / 20 +e sketch how to extend our 

algorithm and analysis for other variants of the map labeling problem. 

2 Labeling Sensors 

In this section we consider a basic (map) labeling problem, which was first examined in 
[10]. The decision version of this problem, which we call map labeling with four squares 
(MLFS) can be defined as follows: Given a set P of n distinct points pi,p 2 , ■ ■ ■ ,Pn 'tn 
the plane and a real cr, decide if one can place n axis-parallel squares Qi, Q 2 , ■ ■ ■ , Qn, 
each of side length a, such that each point pi is the corner of square Qi and no two 
squares overlap. In our algorithm, we make use of an exact algorithm by Kucera et. al. 
[14] that solves an instance of the MLFS problem in time. 

2.1 The Algorithm 

Since in the drop-point model the points may not lie in the unit cube, the algorithm first 
computes a bounding box of the input point set. Then this box is partitioned into a grid 
with side length r = i.e. if the points are in the unit cube then the unit cube 

is partitioned into ra = grid cells for some constant /3 > 0. To reduce the space 
requirements and to have fast access to points within a given grid cell we use perfect 
hashing [17]. Then we first check, whether there is a grid cell that contains 5 or more 
points. If this is true then the following lemma gives an upper bound of r on the size of 
an optimal labeling. 

Lemma 1. Let P be a point set such that at least 5 points are within an axis-aligned 
square Q with side length r. Then an optimal labeling of P has size less than r. 

* The two dimensional Gaussian distribution is the product of two one dimensional Gaussian 
distributions. For simplicity we consider the variance of this one dimensional distribution. 
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Proof. Consider 5 points within the square Q. In any valid labeling each of these points 
is assigned exactly one label and these labels do not intersect. For a point p there are 
exactly 4 choices for placing the label, corresponding to the 4 quadrants centered at p. 
Therefore, in any labeling, the labels of at least 2 points are in the same quadrant. It is 
easy to see that if the label size is at least r, then the labels of 2 points from Q whose 
labels are in the same quadrant must intersect. □ 

In our analysis we later show that with high probability there exists a square con- 
taining 5 or more points. If our algorithm finds a square with side length r containing 
5 or more points, we compute an approximation of the connected components of the 
conflict graph Gp{2r). To do so, we consider grid cells that contain one or more points. 
We say that two grid cells Ci and C 2 are r -close, if there exist points p G Ci and q G C 2 
such that 1 1 (p, q) I loo < 2r. Consider the transitive closure of the binary r-close relation 
over the set of all non-empty grid cells. Then a connected grid cell r-component is an 
equivalence class of this transitive closure. It is easy to see that for a fixed cell C the 
number of r-close grid cells is constant. 

In order to approximate the connected components in Gp{2r) we compute the con- 
nected grid cell 2r-components. We start with an arbitrary point and compute its grid 
cell G. Using our hash function we can determine all other points with the same grid 
cell. Since the number of 2r-close cells for (7 is a constant, we can use a standard graph 
traversal (e.g., BFS or DFS) on the grid cells to compute the connected grid cell 2r- 
component and all points Pc contained in its grid cells in time 0{\Pc\). Overall, we 
need 0{n) time to compute this approximation of the connected components of Gp{2r). 
Then we use the algorithm from [14] to compute an optimal labeling for the points in 
each connected grid component. By our upper bound on the label size these labellings are 
mutually non-intersecting and can therefore be combined to obtain an optimal labeling. 
To prove the expected linear running time we show that w.h.p, there is no connected grid 
component containing l7(logn) points. 

The following lemma summarizes our main technical result and is proved in 
Section 3. 



Lemma 2. Let pi denote the random variable for the position of point pi according to 
some distribution T>i and let P = {pi, . . . ,pn} denote the input point set. Consider a 
subdivision of the plane by a grid of side length . Let a, (3, e > 0 be constants 

such that Q < a < j3 < ^ ^ — 5e is true, and let Ci , cy > 0 be constants. If 



- Pr 

- Pr 

- Pr 



Pi is in the unit cube^ > Ci and 

Pi is in cell C] < .^i\a for every cell G of the grid 

\pi\ > < 2~^ for some constant cy 



then algorithm FastLabeling computes an optimal Labeling of P in 0{n) expected 
time. 



Using this lemma, we obtain the following result for the smoothed complexity of 
the MLFS problem in the drop point model. 

Theorem 1. Let (P, a) be a set ofn points distributed according to the drop-point model 
and let > 7y-i/20-i-«* arbitrary constant e*. Then algorithm FastLabeling 
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FastLabeling(P) 

1 . Compute a bounding box B of P 

2. if there isp G P with |p| > n'^'^ then 

3. compute an optimal labeling using the exact algorithm. 

4. Subdivide B into a grid with side length r = 

5. Pick perfect hash function h mapping the grid cells to {1, , 2n} 

6. Initialize linear lists Li for 1 < i < 2n 

7. for each point p G P do 

8. Let £ denote the number of the square containing p 

9. Store p in list 

10. if some Li contains more than 5 items in the same square then 

11. Determine approximate connected components of graph Gp{2r) 

12. for each connected component C do 

13 Compute an optimal labeling for C. 

14. else compute an optimal labeling using the 2^^'^^ exact algorithm. 



Fig. 1. Exact Algorithm for Map Labeling with four squares 



computes an optimal labeling for P in 0(n) expected time (where the expectation is 
taken over the input distribution). 



Proof. For the drop point model the probability that a point pi is contained in grid cell 
C is maximized when pi is the center of C. For given variance <t^ we have in this case 



Pr [pi is in cell C] < 



< 




e 




2 





1 



2 



The second condition in Lemma 2 requires that Pr [pi is in cell C] < for every 
grid cell C. This implies that ^ 2 .^i +/3 < . Since we want to minimize cr, we have 

to maximize f3 — a. \i can be easily seen that we can achieve j3 — a = 1 /20 — e' for an 
arbitrary small constant e' > 0. This implies that > j^-i/20+e* achieved for 

an arbitrary small constant e* > 0. As one can easily verify the other two condition of 
Lemma 2 are also satished and hence the result holds. □ 



The following result for the average case complexity of the MLFS problem follows 
immediately from Lemma 2. 

Theorem 2. Let P be a set of n points distributed uniformly in the unit cube. Then 
algorithm FastLabeling computes an optimal labeling for P in 0{n) expected time 
(where the expectation is taken over the input distribution). □ 
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3 Analysis 

As mentioned before, we have to prove two main technical lemmas. First we show that 
with high probability at least one grid cell contains 5 or more points. Then we prove that 
with high probability the number of the points within a connected grid cell component 
is 0(log n). Finally, we combine these two results to get our bound on the running time. 

3.1 An Upper Bound on Optimal Label Size 

We assume that the point set P is drawn according to a probability distribution that 
satisfies the three assumptions in Lemma 2. That is, for each point pi in P we have 
Pr [pi is in the unit cube] > ci, Pr [pi is in cell C] < for every cell C of a grid of 
side length r = Y^l/ni+^ and certain constants (3 > a> 0, andPr[|pi| > 
for some constant oj. We denote by L* the random variable for the size of an optimal 
solution for the MLFS problem for the point set P. We consider only the m = 
squares that cover the unit cube. We show that, with high probability, one of these cells 
contains at least 5 points. Using Lemma 1 this implies that with high probability we 
have L* < r. We introduce some definitions in order to prove this result. 

Definition 1. A grid cell C is called -f-important for point pi, if 

Pr \pi is in Cell C] > . 

L J n^+'t 

Definition 2. A grid cell is ( 7 , <5)-heavy, if it is "f -important for more than points. 

Next, we show that there are many ( 7 , (5) -heavy cells, if 7 > /3 and 5 > (3 — a. 

Lemma 3. Let^ > (3,5 > (3 — aandPv\^i is in the unit cube^ > ci. Then the number 
of {li 5)-heavy cells is at least ci • n^^“/2. 

Proof. In the following we consider only the grid cells in the unit cube. We know that 
the sum of the probabilities Pr \pi is in the unit cube] over the n points, is at least c\ ■ n. 
Now assume there are less than c\ ■ n^+“/2 grid cells that are ( 7 , i5)-heavy. Then we 
observe that 



E EPr \pi is in cell C] < 

heavy cell C i—1 



Cl • n 



l+a 



cin 



T"^ — • Tl = ^ 

zi+“ 2 



where the summation is over all ( 7 , 5) -heavy cells in the unit cube. Further we use that 



E EPr [pi is in cell C] < 



A+0 . . 



.1 + 0 ! 



-. 1+0 



1 



C not heavy 2=1 



7,1+T' 



= o(n) , 



where the sum runs over all cells in the unit cube that are not ( 7 , 5)-heavy. It follows 
that the overall sum of the probabilities Pr [pi is in the unit cube] is at most -f o{n), 
which is a contradiction. □ 
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We are now ready to prove an upper bound on the optimal label size. 

Lemma 4. L* < r holds with probability at least 1 — exp{—C 3 ■ where 

"f > P > a and S > P — a and C 3 is a constant. 

Proof. We show that the probability that there is no cell containing at least 5 points is 
very small. Let Bi be the random variable that denotes the number of points in cell I, 
1 < I < m, where m = . The random variables (i?i, . . . , Bm) correspond to the 

process of distributing n balls into m bins according to some probability distribution 
and satisfy the negative regression property [15]. 

Let H denote the set of ( 7 , i5)-heavy grid cells where |iL| > ci • Since a 

cell j G H is ( 7 , (5)-heavy for at least points, we have the following bound: 

Pr[B, > t] > Pr[B, = *,] > ^ (l - 

For large enough n we have (l — ^^ 7 ^)" * > (l — > 7 - Hence 

Pr[Bj < k] < 1 - (^ fc ) 4- ' 



For fc = 5, we have 




1 

4 • (ni+T')5 ) 



(1) 



By negative regression it follows that for tj = 5 {1 < j < m) 

Pr [Hi <5,^2 <5,... <5] < n pdB. < 5 ] < n < 5 ] c 2 ) 

ye[m] j£H 



Combining (1) and (2), we obtain 

Pr [there is no cell with > 5 points] < exp (— C 3 • 
for some constant C 3 . □ 

3.2 Connected Grid Cell Components 

For the second part of our proof we show that with high probability no connected 2r- 
component of size w(log n) exists. As we already pointed out in Section 2.1, this implies 
that the connected components of graph Gp{2r) have size O(logn) w.h.p.. Finally, it 
suffices to solve the map labeling problem for the connected grid cell 2 r-components as 
there are no conflicts between points in different 2 r-components. 

We consider an arbitrary point and we want to determine the number of different 
sets S of s grid cells that possibly form a connected 2r-component containing pi, i.e. 
for any two cells in S there is a path using only cells in S such that two subsequent cells 
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on the path are 2r-close. We call a set S that satisfies this condition 2r-close. We now 
assume that all other points are unlabeled. To count the number of distinct 2r-close sets 
of cells containing pi we use a depth first search of the component and we encode this 
DFS by a string over an alphabet of constant size C4. If we can show that any DFS of 
a 2r-close set of cells of size s can be encoded by a string of length at most k then we 
know that the number of these sets is at most C4. 

Lemma 5. The number of combinatorially distinct 2r -close sets of size s containing pi 
is at most C 4 ®. 

Proof. We consider a DFS starting with the cell containing pi. As we already observed 
the number of cells that are 2r-close to a given cell is a constant. Let C4 — 1 be this 
constant. We want to use an alphabet of size C4 . Thus for a given cell we can use symbols 
1 to C4 — 1 to identify all 2r-close cells. The remaining symbol is used as a special symbol. 
Now we can encode a DFS walk in the following way. We start at the cell containing 
Pi. Then the DFS chooses an unvisited 2r-close cell that contains a point from P. We 
encode this cell by the corresponding symbol. We continue this process recursively like 
a standard DFS. At a certain point of time there is no unvisited 2r-close cell. Then we 
use our special symbol to indicate that the algorithm has to backtrack. 

Our string has length at most 2s because we use the special symbol at most once 
for each cell. For each other symbol a previously unvisited cell is visited. Therefore, the 
length of our string is at most 2s and the lemma follows. □ 



Our next step is to consider the probability that the cells encoded by a fixed given 
string form a connected 2r-component containing pi and £ other points. Let us assume 
that the given string encodes a connected set of cells S of size k < i. We ignore the fact 
that each of these cells must contain at least one point. The overall number of points 
contained in our set of cells is exactly i. We have 



Pr [there are i other points in S] < 




k ^ / n - e ■ k\ 
„l+a J . ^l+a j 



Now we use this formula and combine it with Lemma 5 to obtain 



Lemma 6 . The probability that point pi is in a connected grid cell 2r -component con- 
taining I other points is at most c| • for certain constant C5. 

Proof. From Lemma 5 we know that the number of candidates for combinatorially 
distinct 2r-components with s cells that contain pi is at most C4®. It follows 



Pr [3 connected 2r-component with I points] < (04*) 



l<i<t 

< 4 ■ 



n - e ■ i 
g ■ n^+“ 



for some constant C5. 



□ 
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3.3 Running Time 

If there exists a point pi with \pi\ > rf'’ then we compute the labeling using the algorithm 
by Kucera et al.[14]. We use perfect hashing to store all points. The key of each point 
is the (number of) the grid cell it is contained in. Since \pi \ < rf'' for all 1 < i < n, 
we know that the number of cells is bounded by some polynomial in n. Then we check 
for each point, if it is contained in the unit cube. For all points in the unit cube we use 
the hash function to check, whether there are 5 or more points in the corresponding 
grid cell. This can be done in 0{n) time. Then we compute the connected grid cell 
2r-components. To do this, we start at an arbitrary point and start a DFS of the 2r-close 
grid cells. We mark all visited points. This process can also be implemented in 0{n) 
time using our hash function. The expected running time can be calculated using the fact 
that with probability at most exp(— C 3 • 7 ji+“- 57 - 5 < 5 ^ there is no grid cell containing 5 
or more points. In this case we use the time exact algorithm. We also need that 

the time to process all components of size ^ is at most j ■ Cg for some constant cq. Let 
T(n) denote the expected run time of our algorithm. We get 

T(n) < 0(n) + 2°(^) • (2"” + exp (-C3 • n ■ ^ ■ 4 ■ . 

l<i<n 

For 1 + a — 57 — 55 > 1/2 and a > 0 we have that T(n) = 0{n). Since we also 
have to satisfy the inequalities a < (3 < ^ and 5 > (3 — a we get running time 0{n), if 
the following condition can be satisfied for some e >0:0<a</3<^+^-5e . 
In this case we can choose 7 = /3 + e and all conditions are satished. This completes 
the proof of Lemma 2. 

4 Analysis for Map Labeling with Sliding Labels 

Starting from the study of the basic map labeling problem with four squares by Formann 
and Wagner[ 1 0] , exact and approximation algorithms have been obtained for various map 
labeling problems concerned with labeling points with polygons of different shapes and 
in various possible orientations, (see for e.g. [9], [13]) In this section, we show how to 
extend the analysis for the basic map labeling problem for the problem of labeling points 
with uniform axis-parallel squares in the slider model, where the point can be placed 
anywhere on the boundary of the square associated with it. The map labeling problem 
for labeling points with sliding labels is as follows: 

Definition 3 (sliding labels). Given a set P of n points in the plane, place n non- 
overlapping uniform axis-parallel squares of maximum possible size such that each 
point Pi € P is associated with a unique square and lies on the boundary of its labeling 
square. 

This problem was shown to be NP-hard by van Kreveld et. al [13]. The best known 
approximation algorithm for this problem has an approximation factor of | with a 
running time of 0{nf log n) [20]. An algorithm is a polynomial time approximation 
scheme for the map labeling problem with sliding labels if it takes an additional parameter 
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e and always computes a valid labeling of size at least times the optimum labeling 
size, where the running time is polynomial in the representation of the set of points. 

For a given value S and a point p, consider the square Sp of side length 2S centered 
at the point p. It is easy to see that any square of side length 6 that has the point p on its 
boundary, lies completely within the square Sp and has two of its corners lying on one 
of the sides of Sp. We consider a subdivision of the square Sp by an axis-parallel grid of 
side length Now, consider a valid labeling for a set of points P with squares of size 
length S. For every point p, we can choose a square of size at least <5(1 — |), such that 
the corners of the labeling square lie on the grid for square Sp. Observe that for a point 
p, there are | possible positions for placing the labeling square such that its corners lie 
on the grid. The next lemma gives an upper bound on the optimal labeling size for the 
map labeling problem with sliding labels. 

Lemma 7. Let P be a point set such that at least 5 points are within an axis-aligned 
square Q with side length r. Then an optimal labeling of P in the slider model has label 
size less than 2r. 

The exact algorithm of Kucera et. al [14] can be extended to obtain a polynomial 
time approximation scheme for the map labeling problem with sliding labels. 

Lemma 8. There is a polynomial-time approximation scheme for the map labeling prob- 
lem with sliding labels with a running time of2^^'^'°^z). 

It is easy to see that in the slider model, there can be a conflict between the labels 
of two points p and q only if doo {p, q) < 2r where r is an upper bound on the optimal 
label size. Hence, the conflict graph Gp{2r) captures all possible conflicts given an 
upper bound r on the optimal label size. Therefore, the analysis in sections 3 and 4 for 
the MLFS problem extends without much change for the slider model. In the algorithm 
FastLabeling(P), we can use the e-approximation algorithm for sliding labels instead 
of the 2^^'^^ exact algorithm. 

In the running time analysis, we need to choose values for a, /3, 7, and 6 such that 
1 -f a — 57 — 5(5 > 1/2-1- log „(log i) and o; > 0 in order that we have T{n) = 0(n). 
We also have the inequalities a < /3 < j and S > /3 — a. If we can satisfy the following 
condition for some e' > 0: 0 < a < P < ^ + ^~5e' — log „ (log i) then we can 
choose 7 = /3 -f e' and all conditions are satisfied. 

Hence, we obtain the result that an e-approximation labeling for the map labeling 
problem in the slider model can be computed in 0(n) expected time if the conditions of 
Lemma 2 hold. 
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Abstract. We present basic graph decomposition lemmas and show how 
they apply as key ingredients in the probabilistic embedding theorem 
stating that every n point metric space probabilistically embeds in ul- 
trametrics with distortion O(logn) and in the proof of a similar bound 
for the spreading metrics paradigm in undirected graphs. This provides 
a unified framework for these metric methods which have numerous al- 
gorithmic applications. 



1 Introduction 

This paper presents new graph decomposition lemmas which lie at the heart of 
two metric-based techniques that have been the basis for many results in approx- 
imation and online algorithms in recent years: probabilistic metric embedding 
in ultrametrics [2] and the spreading metrics paradigm [10]. We thus provide a 
unified framework for these techniques. 

Our main graph decomposition lemma is similar in spirit to the well-known 
graph decompositions of Garg, Vazirani and Yannakakis [15] and Seymour [19]. 

Such a decomposition finds a low diameter cluster in the graph such that the 
weight of the cut created is small respect to the weight of the cluster. These type 
of decompositions have been extremely useful in the theory of approximation al- 
gorithms, e.g. for problems such as minimum multi-cut, balanced separators, and 
feedback arc-set. Our new decomposition is essentially based on the decomposi- 
tion of [15] performed in a careful manner so as to achieve a more refined bound 
on the ratio between the cut and the cluster’s weight. The graph decomposition 
lemma is presented in Section 2. 

We also give a new probabilistic decomposition lemma. Probabilistic parti- 
tions, introduced in [2], are partitions of the graph into low diameter clusters 
with small probability of cutting any edge. Such partitions have been used in the 
context of probabilistic embeddings and are shown over time to have a yet more 
significant role in the theory of metric embeddings. Here, we present a variation 
on this idea. Instead of partitioning the entire graph we focus on constructing 
one low diameter cluster, as in the previous lemma case. Roughly, the decompo- 
sition has the property that the probability an edge is cut is small with respect 
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to the probability that it is included in the cluster. The lemma is essentially 
based on the decomposition of [2] preformed in a careful manner so as to achieve 
a more refined bound on the ratio between these probabilities. The probabilistic 
decomposition lemma is presented in Section 4. 

Probabilistically embedding metric spaces in ultrametrics (and hierarchically 
well-separated trees) was introduced in [2]. The original metric space is embed- 
ded in a randomly chosen (dominating) ultrametric so that the distortion is kept 
low in expectation. Such an embedding is extremely useful for algorithmic ap- 
plication since it reduces the problem of obtaining an approximation algorithm 
for the original space to obtaining an algorithm for ultrametrics, a task which is 
often considerably easier. It has implications for embeddings in low dimensional 
normed spaces as well [7]. In [2] we have introduced probabilistic partitions and 
proved that such a partition gives a probabilistic embedding in ultrametrics. In 
[3] we have proved that every metric space on n points can be probabilistically 
embedded in ultrametrics with distortion O (log n log log n), based on the tech- 
nique of [19]. Recently, the bound was improved to O(logn) [13] by providing an 
improved probabilistic partition, which is based on techniques from [9,12]. This 
bound is tight [2]. The theorem has been used in many applications, including 
metric labelling [17], bulk-at-buy network design [1], group Steiner tree [14], 
min-sum clustering [5], metrical task systems [4,11], etc. (see the survey [16] for 
a more comprehensive summary of applications.) The approximation algorithm 
for some of these problems can be made deterministic via methods of [8] by 
using the dual problem to probabilistic embedding, which we call distributional 
embedding: given a distribution over the pairs in the metric space construct a 
single ultrametric as to approximate the expected distance. It follows from du- 
ality that the best solutions for these two problems have the same distortion 
bound. Moreover, it is shown in [3,8] that a solution to the dual problem can 
be used to construct an efficient probabilistic embedding. In [13] such a dual 
solution is obtain by applying derandomization techniques to their probabilistic 
algorithm. Here, we use our graph decomposition lemma to give a simple direct 
construction of a distributional embedding in ultrametrics with O(logn) distor- 
tion. This implies simpler approximation algorithms in the applications. This is 
done in Section 3. 

We next turn to the probabilistic embedding version, and show how our 
probabilistic decomposition lemma, together with techniques from [3], yield a 
simple direct efficient construction of probabilistic embedding in ultrametrics 
with 0(log n) distortion. This appears in Section 4. 

A related notion of embedding is that of multi-embeddings introduced in [6] . 
We note that our graph decomposition lemma applies to obtain improved results 
for multi-embeddings in ultrametrics as well. 

Spreading metrics were introduced in [10]. They consider minimization prob- 
lems defined in a graph G with capacities and assume the existence of a spreading 
metric which provides a lower bound on the cost of a solution for the problem 
as well as some guaranty which relates the distances to the cost on smaller parts 
of the problem. In the context of this paper we restrict the graph G to be undi- 
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rected. This method has been applied to several problems, including linear ar- 
rangement, balanced partitions, etc. An approximation ratio of O(lognloglogn) 
was given for these problems [10], based on the technique of [19]. Most of these 
were later improved in [18] to O(logn). However, their method was not general 
and excluded, in particular, the problem of embedding graphs in d-dimensional 
meshes, where we would like to minimize the average distance in the embedding. 
In this paper we improve the generic spreading metric bound to O(logn). We 
note that the spreading metrics paradigm applies to some other problems in 
directed graphs, including directed feedback arc-set. The directed case remains 
a challenging open problem. We discuss the application of our graph decompo- 
sition lemma to the spreading metrics paradigm in Section 5. 



2 Graph Decompositions 

2.1 Decomposition Lemma 

Let G = {V,E) he an undirected graph with two weight functions c,x : E ^ K’*’. 
We interpret x{e) to be the length of the edge e, and the distance between pairs 
of vertices u,v in the graph, denote d{u,v), is determined by the length of the 
shortest path between them. Given a subgraph H = (Vh,Eh) of G, let dn 
denote the distance in H, let A{H) denote the diameter of El, and A = A{G). 
We define the volume of H, 4>{El) = J2eeEH 2 ^(e)c(e). 

Given a subset S C V, G{S) denotes the subgraph of G induced by S. 
Given partition (S,S) let T(S') = {(it, w) € E-,u € S,v € S} and cut(S') = 
SeGr(S) 

For a vertex v and r > 0, the ball at radius r around v is defined as B(v,r) = 
{u G V\d{u, v) < r}. Let S = B{v, r). Define 

^{S) = ^{v,r)= ^ x{e)c{e)+ ^ c{e){r - d{v,u)). 

e—{u,w)\u,w^S e—{u,w)^r{S) 



Given a subgraph El, we can similarly define (j)H with respect to the subgraph 
H. Define the spherical-volume of H, 

= max^niv, A{H)/A). 

v^H 



Lemma 1. Given a graph G, there exist a partition (S', S) of G, where S is a 
ball and 



cut(S) < 



81 n(^*(g)/<^*(g(^))) 

A{G) 



■m. 



Proof. Let v he a, node that minimizes the ratio 4>{v , A / A) / (j){y , A / 'S) . We will 
choose S = B{v,r) for some r G [Z\/8,Z\/4]. We apply a standard argument. 
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similar to that of [15]. First observe that = cut(i?(r', r)). Therefore there 

exist r such that 



cut{B{v,r))/(j){v,r) < 



df{v,r) \ fA{G) 
Jr=A/8 4’{v,r) j \ 8 



In (0(u,Z\/4)/0(u,Z\/8)) 
A{G)/8 



Now, let u be the node that maximizes ^s(m, A{G{S))/A). Since Z\(G(«S')) < 
Z\/2 we have that 4>*{G{S)) < ^{u,A/8). Recall that (j)*{G) > ^{u,A/4). By 
the choice of v we have 



cut(S') < ^ r) 

^8\n{r{G)/r{G{S))) 

3(G) 



□ 



2.2 Recursive Application 

The decomposition comes useful where it is applied recursively. We show in 
Sections 3 and 5 that our applications posses a cost function a over subgraphs 
G of G which is nonnegative, 0 on singletons, and obeys the recursion: 

a(G) < a{G{S)) + a{G{S)) + A{G) ■ cut(S'). (1) 

Lemma 2. The function defined by (1) obeys a{G) < 0(log{4>/(l)Q))-4>{G) where 
(j) = 4>{G) and (fo is the minimum value of 4>{G) on non-singleton subgraphs G. 

Proof. We prove by induction on the size of subgraphs G of G that a(G) < 
8 ln{(p* {G) / 4>o) ■ 4>{G). Let G be a subgraph of G. Using lemma 1 for G we obtain 
S = B{v, r) so that 

a{G) < a{G{S)) + a{G{S)) + A{G) ■ cut{S) 

< 81n (^r{G{S))/M) ■ f>{G{S)) + 81n (,/>*(G(5))/(/.o)) • <P{G{S)) 

+ 8ln(cf*{G)/cl>*{G{S)))-^{v,r) 

< 8 In (<A*(G)/</.o) • ^(u,r) + 81n (</>*(G)/</-o)) • </>(G(5)) 
<81n(<^*(G)/0o) •</>(G). 

The result follows since 4>*{G) < 4>{G). □ 

To obtain a bound depending only on n, we again apply a standard argument 
from [15]: modify the process slightly by associating a volume ffiG) jn with nodes. 
This ensures that cj>Q > (f>{G)/n. 

Lemma 3. The function defined by (1) using the modified procedure obeys 
a(G) < O(logn) -^(G). 
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3 Embedding in Ultrametrics 

Definition 1. An ultrametric (HST^) is a metric space whose elements are the 
leaves of a rooted tree T . Each vertex u € T is associated with a label A{u) > 0 
such that A{u) = 0 iff u is a leaf of T. It is required that if a u is a child of 
a V then A{u) < A{v) . The distance between two leaves x,y £T is defined as 
A{\c&{x,y)), where lca(a;,j/) is the least common ancestor of x and y in T. 

We study the set of metric spaces defined over an n-point set V. Given such 
a metric space M, the distance between u and n in E is denoted dM{u,v). It is 
useful to identify the metric space with a complete weighted graph G = (V,E), 
where edge weights x(e) are distances in M. 

The following definitions are from [2]. 

Definition 2. A metric space N dominates another metric space N if for every 
u,v G V, dN{u,v) > dM{u,v). 

Definition 3. A metric space M a-probabilistically embeds in a class of metric 
spaces M if every N G Af dominates M and there exists a distribution T> over 
M such that for every u,v G V : 

E[dAr(u,n)] < adM{u,v). 

The following notion serves as a dual to probabilistic embedding [3,8]. 

Definition 4. A metric space M a-distributionally embeds in a class of metric 
spaces M if every N G J\f dominates M and given any weight function c : ^ 

IR'*', there exists N G Af satisfying: 

c{u,v)dN{u,v) <a c{u,v)dM{u, v). 

It follows from duality considerations (see [3,8]) that a metric space a- 
distributionally embeds in N iff it a-probabilistically embeds in N . 

Theorem 1. Any metric space on n points O (log n)-distributionally embeds in 
an ultrametric. 

Proof. We build an ultrametric recursively as follows: use the decomposition 
described in Section 2 to obtain a partition of the graph (S,S). Run the al- 
gorithm on each part recursively, obtaining ultrametrics (HSTs) Tg and Tg 
respectively. An ultrametric (HST) T is constructed by creating a root r la- 
belled A with two children at which we root the trees Tg and Tg. Define 
cost(T) = E(^.„)Gisc(u,n)dr(M,n). Then 

cost(T) = cost(Tg) + cost(Tg) -|- Z\ • cut(S'). 

The theorem now follows from Lemma 3. □ 

^ A fc-HST [2] is defined similarly while requiring that A(u) < A(v)/k. Any ultrametric 
k-embeds [3] and 0(fc/ log fc)-probabilistically embeds [5] in a fc-HST. 
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4 Probabilistic Embedding 

Theorem 1 implies the basic result on probabilistic embedding of arbitrary met- 
rics in ultrametrics. Moreover, it is shown in [3,8] how to efficiently construct the 
probabilistic embedding. In this section we combine ideas from our new graph 
decomposition together with ideas from [2,3] to obtain the following probabilistic 
decomposition lemma, which is then applied to give a simple and efficient direct 
construction of a probabilistic embedding in ultrametrics. 

4.1 Probabilistic Decomposition Lemma 

Let R = L\/C, where C = 128. Define A(a;) = |i3(a;, 16i?)|/|i?(a;, i?)|. Let v 
be the vertex that minimizes A(n). Choose a radius r in the interval [2i?, 4i?[ 

according to the distribution^ p{r) = ( - 2 ) A(u) ~ ^ . Now define a 
partition (S,S) where S = B{v,r). 

The distribution above is chosen to satisfy the following relation: 

Lemma 4. 

Pr[(a:,y) S r{S)] < [(1 - Pr[x,y S S]) ■ InA(x) -h A(u)"^] • 

JrC 

Proof. Assume w.l.o.g. that y is closer to v than x. We have: 



Pr[{x,y) € r{S)] = f p(r)d. 

Jd{v,y) 

= (l^g^)A(^)-^(l-A(u) 

A(l 



d(v,y) 



d{v,x)-d{v,y) 

« ) 



d{x,y) 



R 



(2) 



r4R 



1 — Pr[x, y € S] = / p{r)dr 
J d{v,y) 






_ d(v,y) 



-4x 



Therefore we have: 



(3) 



Pr[(x, y) € r{S)] - (1 - Pr[x, y G S]) ■ In A(x) 

1 w ^ d{x,y) 1 d{x,y) 

^ •InA(u)-— ^<A(u) 



where we have used the fact that A(x) > A(u), and the last inequality holds for 
any value \{v) > 1. This completes the proof of the lemma. □ 

^ This is indeed a probability distribution. Note that we may assume \{v) > 1 other- 
wise the partition is trivial. 
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4.2 The Probabilistic Embedding Theorem 

Theorem 2. Any metric space on n points O {log n) -probabilistically embeds in 
an ultrametric. 



Proof. We build an ultrametric recursively as follows: use the probabilistic de- 
composition described above to obtain a partition of the graph (S', S). Run the 
algorithm on each part recursively, obtaining ultrametrics (HSTs) Ts and Tg re- 
spectively. An ultrametric (HST) T is constructed by creating a root r labelled 
A with two children at which we root the trees Ts and Tg. 

The analysis shows that for every (x, y) € E: 

< ln|R(x,16i?)| -hln|R(x,8i?)|. 

6 • d(x,y) 

The proof is by induction on the construction of the HST. We have 



E[dT{x,y)] = Pr[(x,?/) € T{S)]A 

-hPr[x, y G S] • E[dTs{x,y)] -hPr[x,y G S] • E[dTs(a:, y)]. (4) 

We now apply the induction hypothesis. We assume that Pr[(x, y) G T{S)] yf 
0, otherwise the result follows immediately from the induction hypothesis. This 
implies that R(x,4i?) includes x or y. We may also assume d{x,y) < i?/2, 
otherwise the result is immediate. It follows that d{v, x) < 5R and so that 
B{v,R) C B{x,8R). 

E[dTa(x,y)] ^ - ,S| + In |R(x, 8R) - S\ 

C ■ d{x,y) 

< ln|R(x, 16R)| -f ln(|R(x,8R)| - |R(u,i?)|), (5) 

Since A{S) < 8R we have: 

< In \B{x, R)| + In |R(x, R/2)\, 

C ■ d{x,y) 

and since B{v, 2R) does not include both x and y, we may assume that d{v, x) > 
2R — d{x,y) > 3/2 • R. Thus, we have that B{x,R/2) C B{x,8R) — B{v,R), 
which gives: 

< ln\B{x,R)\+H\B{x,8R)\ - \B{v,R)\). (6) 

6 • d(x,y) 

We apply Lemma 4. Recall that A(x)“^ = |i?(u, i?)|/|R(x, 16i?)|. Since 
d{v,x) < 5R we have that B{x,8R) C R(x, 16i?), hence A(u)“^ < 

\B{v, R)\/\B{x,8R)\. Thus the lemma implies: 



Pr[(x,y) G r{S)] < [(1 - Pr[x,y G S']) • ln(|R(x, 16i?)|/|R(x, R)|) 
+ |R(x,R)|/|i?(x,8i?)|]-^^. 



R 



( 7 ) 
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We get the inductive claim by plugging the above inequalities in (4): 



E[dT{x,y)] 

Cd{x,y) 



< (1 -Pr[x,?/ € S']) 



\B{x,R)\ 
+ Pr[x, y e S] • In \B{x, 16i?)| 



In + In \B{x, R) \ 



+ 



ln(|i?(x,8i?)|-|B(z;,i?)|) 



\B{v,R)\ 

\B{x,8R)\ 



< ln|S(x, 16i?)| +ln|B(a;,8i?)|. 



□ 



5 Spreading Metrics 

The spreading metric paradigm [10] applies to minimization problems on undi- 
rected graphs G = {V,E) with edge capacities c(e) > 1 for e & E. Associated 
is an auxiliary graph H and a scaler function on subgraphs of H. A decom- 
position tree T is a tree with nodes corresponding to non-overlapping subsets 
of V, forming a recursive partition of V. For a node t of T denote by Vt the 
subset at t. Associated are the subgraphs Gt,Ht induced by Vt- Let Et be the 
set of edges that connect vertices that belong to different children of t, and 
c{Ft) = Yhe^Ft ^ cost(T) = scaler(iLt) • c{Ft). 

Definition 5. A spreading metric a function x : E ^ satisfying: 

X]a^(e)c(e) is a lower hound on the optimal cost, and for any U CV, and Hjj 
the subgraph of H induced by U, A(U) > scaler(iJ[/). 

Theorem 3. There exist an O(logn) approximation for problems in the spread- 
ing metrics paradigm. 

Proof. The algorithm is defined recursively: use the decomposition described in 
Section 2 to obtain a partition of the graph (S,S). Run the algorithm on each 
part recursively. A tree T is constructed by creating a root r with two children 
at which we root the trees Tg and Tg. The cost of the tree is given by 

cost(T) = cost(Ts) -I- cost(T 5 ) -I- scaler(iJ) • cut(S') 

< cost(Ts) -I- cost(T 5 ) -I- A ■ cut(5'). 

The result now follows from Lemma 3. □ 



References 

1. B. Awerbuch and Y. Azar. Buy-at-Bulk Network Design. In Proc. 38th IEEE 
Symp. on Foundations of Computer Science , 542-547, 1997. 

2. Y. Bartal, Probabilistic Approximation of Metric Spaces and its Algorithmic Ap- 
plications. In Proc. of the 37rd Ann. IEEE Symp. on Foundations of Computer 
Science, 184-193, 1996. 



Graph Decomposition Lemmas 97 



3. Y. Bartal, On Approximating Arbitrary Metrics by Tree Metrics. In Proc. of the 
30th Ann. ACM Symp. on Theory of Computing, 161-168, 1998. 

4. Y. Bartal, A. Blum, C. Burch, and A. Tomkins. A polylog(n)-Competitive Algo- 
rithm for Metrical Task Systems. In Proc. of the 29th Ann. ACM Symp. on Theory 
of Computing, 711-719, 1997. 

5. Y. Bartal, M. Charikar, and D. Raz. Approximating Min-Sum Clustering in Metric 
Spaces. In Proc. of the 33rd Ann. ACM Symp. on Theory of Computing, 11-20, 
2001 . 

6. Y. Bartal and M. Mendel. Multi-Embedding and Path Approximation of Metric 
Spaces. In Proc. of the 14th ACM-SIAM Symp. on Discrete Algorithms, 424-433, 
2003. 

7. Y. Bartal and M. Mendel. Dimension Reduction for ultrametrics. In Proc. of the 
15th ACM-SIAM Symp. on Discrete Algorithms, 664-665, January 2004. 

8. M. Charikar, C. Chekuri, A. Goel, S. Guha, and S. Plotkin Approximating a Finite 
Metric by a Small Number of Tree Metrics, IEEE Symposium on Foundations of 
Computer Science, 379-388, 1998. 

9. G. Calinescu, H. Karloff, Y. Rabani. Approximation Algorithms for the 0-Extension 
Problem. In Proc. of the 12th ACM-SIAM Symp. on Discrete Algorithms, 8-16, 
2001 . 

10. G. Even, J. Naor, S. Rao, and B. Scheiber. Divide-and-Gonquer Approximation 
Algorithms via Spreading Metrics. In Proc. of the 36th Ann. IEEE Symp. on Foun- 
dations of Computer Science, 62-71, 1995. 

11. Amos Fiat and Manor Mendel. Better Algorithms for Unfair Metrical Task Systems 
and Applications, In Proc. of the 32nd Annual ACM Symposium on Theory of 
Computing, 725-734, 2000. 

12. J. Fakcharoenphol, C. Harrelson, S. Rao and K. Talwar. An Improved Approxi- 
mation Algorithm for the 0-Extension Problem. In Proc. of the Ifth ACM-SIAM 
Symp. on Discrete Algorithms, 257-265, 2003. 

13. J. Fakcharoenphol, S. Rao and K. Talwar. A Tight bound on Approximating Ar- 
bitrary metrics by Tree metrics. In Proc. of the 35th Ann. ACM Symp. on Theory 
of Computing, 448-455, 2003. 

14. N. Garg, G. Konjevod, and R. Ravi. A Polylogarithmic Algorithm for the Group 
Steiner Tree Problem. In Journal of Algorithms, 37(l):66-84, 2000. 

15. N. Garg, V. Vazirani, and Y. Yannakakis. Approximate Max-Flow Min-(Multi) 
Cut Theorems and Their Applications. In SIAM J. Computing 25, 235-251, 1996. 

16. P. Indyk. Algorithmic Applications of Low-Distortion Geometric Embeddings. In 
Proc. of fSnd Symposium on Foundations of Computer Science, 10-33, 2001. 

17. J. M. Kleinberg and E. Tardos. Approximation Algorithms for Classification Prob- 
lems with Pairwise Relationships: Metric Labeling and Markov Random Fields. In 
Proc. of the 40th Ann. IEEE Symp. on Foundations of Computer Science, 14-23, 
1999. 

18. S. Rao and A. Richa. New Approximation Algorithms for some Ordering Problems. 
In Proc. 9th ACM-SIAM Symp. on Discrete Algorithms, 211-219, 1998. 

19. P.D. Seymour. Packing Directed Circuits Fractionally. In Combinatorica, 
15(2):281-288, 1995. 



Modeling Locality: A Probabilistic Analysis of 
LRU and FWF* 



Luca Becchetti 

University of Rome “La Sapienza”, 
becchettSdis . uniromal . it 



Abstract. In this paper we explore the effects of locality on the per- 
formance of paging algorithms. Traditional competitive analysis fails to 
explain important properties of paging assessed by practical experience. 
In particular, the competitive ratios of paging algorithms that are known 
to be efficient in practice (e.g. LRU) are as poor as those of naive heuris- 
tics (e.g. FWF). It has been recognized that the main reason for these 
discrepancies lies in an unsatisfactory modelling of locality of reference 
exhibited by real request sequences. 

Following [13], we explicitely address this issue, proposing an adversarial 
model in which the probability of reqnesting a page is also a function 
of the page’s age. In this way, our approach allows to capture the ef- 
fects of locality of reference. We consider several families of distributions 
and we prove that the competitive ratio of LRU becomes constant as 
locality increases, as expected. This result is strengthened when the dis- 
tribution satisfies a further concavity /convexity property: in this case, 
the competitive ratio of LRU is always constant. 

We also propose a family of distributions parametrized by locality of 
reference and we prove that the performance of FWF rapidly degrades 
as locality increases, while the converse happens for LRU. 

We think, our results provide one contribution to explaining the be- 
haviours of these algorithms in practice. 



1 Introduction 

Paging is the problem of managing a two-level memory consisting of a first 
level fast memory or cache and a second level slower memory, both divided into 
pages of equal, fixed size. We assume that the slow memory contains a universe 
V = {pi, . . . ,pm} of pages that can be requested, while the cache can host a 
fixed number k of pages. Normally, M ^ k. 

The input to a paging algorithm is a sequence of on-line requests a = 
{cr(l), . . . ,a{n)}, each specifying a page to access. The z-th request is said to 
have index i. If the page is in main memory the access operation costs nothing. 
Otherwise, a page fault occurs: the requested page has to be brought from main 
memory into cache in order to be accessed. If the cache is full, one of the pages 
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in cache has to be evicted, in order to make room for the new one. When a page 
fault occurs, the cost of the access operation is 1. Each page has to be served 
before the next request is presented. The goal is minimizing the total cost, i.e. 
the number of page faults. We use |cr| to denote the length of a. 

Given a sequence a, the cache state C{a) is the set of pages in cache after 
(7 has been served. The output of a paging algorithm corresponding to request 
sequence cr is the sequence C((Ti),... ,C((T|cr|) of cache states, with cti = cr(l) 
and Ui = for i > 1. 

Paging algorithms. The core of a paging algorithm is the strategy according 
to which pages are evicted in the presence of faults. In this paper we analyze the 
performance of two well known heuristics for page eviction, namely LRU (Least 
Recently Used) and FWF (Flush When Full). The former is the best known algo- 
rithm for virtual memory management in operating systems [16] and has found 
application in web caching as well. The latter has mostly theoretical interest. 
For the sake of the analysis we are also interested in the off-line optimum. A 
well known optimal algorithm for paging is LFD (Longest Forward Distance). 
These algorithms are described below. 

— Longest Forward Distance (LFD): Replace the page whose next request 
is latest. LFD is an optimal off-line algorithm and was proposed in an early 
work by Belady [2]. 

— Least Recently Used (LRU): When a page fault occurs, replace the page 
whose most recent request was earliest. 

— Flush When Full (FWF): Assume the cache is initially empty. Three 
cases may occur by a request: i) the page requested is in cache: in this case 
the request is served at no extra cost; ii) a page fault occurs and the cache 
is not full: in this case the page is brought into cache, no page is evicted; iii) 
a page fault occurs and the cache is full: in this case, all pages in the cache 
are evicted, i.e. the cache is flushed. Note that both in case ii) and iii) the 
cost incurred by the algorithm is 1. 

Both LRU and FWF belong to the class of marking algorithms. Roughly 
speaking, marking algorithms try to keep recently requested pages in cache, 
under the assumption that pages that have been requested in the recent past are 
more likely to be requested in the near future. Many marking algorithms have 
been proposed in practice, the most prominent ones are described in [3,16]. 
Contribution of this paper. Theory shows that LRU and the naive FWF 
heuristic have the same competitive ratio, practice provides pretty different re- 
sults [18]: LRU’s performance is almost optimal on most instances of practical 
interest, whereas FWF often behaves poorly, as intuition suggests. This dis- 
crepancy is a consequence of the worst case approach followed by competitive 
analysis, which fails to capture the phenomenon of locality of reference [16]. 

In this paper, we extend the diffuse adversary approach introduced by Kout- 
soupias and Papadimitriou [13]. In their model, the adversary is constrained 
to generate the input sequence according to a given family of probability dis- 
tributions. We extend the work in [13] and [22,20] in the following way: i) We 
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propose a new framework in which the probability that some page p is requested, 
conditioned to any request sequence prior to p, also depends on how recently p 
was last requested; ii) We prove that, under a very large class of distributions, 
LRU’s competitive ratio rapidly tends to become constant as locality increases, 
as observed in practice; iii) We assess the poor behaviour of FWF, by propos- 
ing a family of distributions parametrized by locality of reference and proving 
that FWF’s performance degrades as locality increases. Together with the for- 
mer ones, this result sharply discriminates bertween the behaviours of LRU and 
FWF, as observed in practice. 

Related results. Sleator and Tarjan [15] introduced competitive analysis and 
were the first to show that FIFO and LRU are fc-competitive. They also proved 
that this is the lower bound for any paging algorithm. Successively, Young and 
Torng [21,17] proved that all paging algorithms falling in two broad classes, 
including FIFO, LRU and FWF, are fc-competitive. Torng [17] also proved that 
all marking algorithms are 0(fc)-competitive^. 

A considerable amount of research was devoted to overcoming the apparent 
discrepancies between theory and practice. Due to lack of space, we give a non- 
chronological overview of research in the field, giving emphasis to contributions 
that are closer to our approach in spirit. 

The results of [13] where extended by Young [22,20], who proved that the 
competitive ratio of a subclass of marking algorithms (not including FWF) 
rapidly becomes constant as e = 0{l/k), is 0(logfc) when e = l/k and rapidly 
becomes I7(fc) as e = f2{l/k). He also proved that, under this adversary, this is 
asymptotically the lower bound for any deterministic on-line algorithm. The role 
of randomization was explored by Fiat et al. [9], who proposed a randomized 
paging algorithm MARK that is iLfc-competitive, a result which is tight. 

Borodin et al. [4] assume that pages are vertices of an underlying graph, 
called access graph. Locality is modelled by constraining request sequences to 
correspond to simple walks in the access graph. The same model was used and 
extended in subsequent works [11,10,14,4,14] and it was extended in [8] to multi- 
pointer walks in the access graph. 

The work of [12] is similar in spirit, although the approach is probabilistic: 
the authors assume that the request sequence is generated according to a Markov 
chain M . They consider fault rate, i.e. the long term frequency of page faults 
with respect to M. 

A deterministic approach based on the working set of Denning [7] is proposed 
by Albers et al. [1]. 

Young [18,19] introduces the notion of loose competitiveness. A paging al- 
gorithm A is loosely c(/c)-competitive if, for any request sequence, for all but a 
small fraction of the possible values for k, the cost of the algorithm is within 
c(fc) times the optimum for every request sequence. The author proves [18] that 
a broad class of algorithms, including marking algorithms, is loosely 0(log k)- 
competitive. 

^ Torng’s result is actually more general, but this is the aspect that is relevant to the 
topics discussed in this paper. 
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Finally, Boyar et al. [6] analyze the relative worst order ratios of some promi- 
nent paging algorithms, including LRU, FWF and MARK. The worst order ratio 
of two algorithms A and B, initially proposed for bin packing [5] is defined as 
the ratio of their worst case costs, i.e. their worst case number of faults in the 
case of paging, over request sequences of the same length. 

Roadmap. This paper is organized as follows: In Section 2 we propose our 
model and discuss some preliminaries. In Section 3 we provide a diffuse adversary 
analysis of LRU, where the distributions allowed to the adversary are used to 
model temporal locality. We prove that LRU is constant competitive with respect 
to a wide class of distributions. In Section 4 we prove that, under the same diffuse 
adversary, the performance of FWF is extremely poor. For the sake of brevity 
and readibility, many proofs are omitted and will be presented in the full version 
of the paper. 

2 Model and Preliminaries 

The diffuse adversary model was proposed by Koutsoupias and Papadimitriou 
[13]. It removes the overly pessimistic assumption that we know nothing about 
the distribution according to which the input is generated, instead assuming that 
it is member of some known family A. We say that algorithm A is c-competitive 
against the Z\-diffuse adversary if, for every distribution D £ A over the set of 
possible input sequences, 

ED[A{a)] < cED[OPT{a)] + b, 

where a is generated according to D and 6 is a constant. The competitive ratio 
of A against the Zi-diffuse adversary is the minimum c such that the condition 
above holds for every D £ A. In the sequel, we drop D from the expression of 
the expectation when clear from context. 

The request sequence is a stochastic process. Without loss of generality, in 
the following we consider request sequences of length n. Considered any request 
sequence a and its i-th request cr(i), we call <t(1) • • • a{i — 1) and a{i -b 1) • • • cr(n) 
respectively the prefix and the postfix of cr(z). We define the following random 
variables: If i < n, r{i) denotes the i-th page requested. If i < n, r{i) denotes 
the prefix of the i-th request. 

2.1 Distribution of Page References 

Similarly to [13] and [22], the family of distributions we consider is completely 
defined by the probability P [p | cr] that p is requested provided a is the prefix, 
for every p £ V and for every possible prefix a that satisfies \a\ < n — 1. In 
particular, we assume that P[p|ct] belongs to some family A. The following 
definition will be useful to define some of them. 

Definition 1. Consider a monotone, non increasing function f : I ^ [0j1]> 
where /={!,... , N} for some positive integer N. We say that f is concave in 
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I if, for all i = 2, . . . ,N — 1 : f{i — 1) — f{i) < f{i) — f{i + 1). f is convex in 
I if for all i = 2, N - 1 : f{i - 1) - f{i) > f{i) - f{i + 1). 

Assume a is the request sequence observed so far. Provided both pages pj^ 
and pj^ appear in request sequence cr, we say that is younger than pj^ if 
the last request for pj^ in a occurs after the last request for pj^, while we say 
that pj^ is older than pj^ otherwise. The age of any page pj with respect to 
a is then defined as follows: age(pj, cr) = I, if pj is the l-th. most recently ac- 
cessed page, while age{pj,a) = oo if pj does not appear in a. For the sake 
of simplicity, we assume in the sequel that age(pj,a) < oo for every pj. This 
assumption can be removed by slightly modifying the definitions that follow. 
This is shown in the full version of the paper, for brevity. We now define, for 
every prefix sequence tr, a total order iF(cr) on V by ordering pages according 
to their relative ages. More precisely, for any pages Pj^, Pj^, Pj^ Pj 2 if 

and only if age{pj„,a) < age{pj„,,a). T{a) obviously implies a one-to-one cor- 
respondence /o- : V — >■ {I,... ,M}, with fa{Pj) = I if, given a, pj is the l-th 
most recently accessed page. P[p | cr] in turn may be regarded as a distribution 
over {!,... , M} by defining P[a: | cr] = P]/“^(a;) j cr] , for every x = 1, . . . , M. 
So, for instance, by saying that P]pjcr] has average value /i, we mean that 

%k] = Ef=i^P[^k] =M- 

2.2 Diffuse Adversaries 

We next define three families of probability distributions according to which, 
given any prefix a, the next page request can be generated. 

Definition 2. G{t) is the family of probability distributions over {!,... , M} 
that have expeeted value at most k/2 and standard deviation r. T>i (resp. T> 2 ) 
are the families of distributions that are monotone, non-increasing and concave 
(resp. convex) in . , M}. 

The first property of Definition 2 models the well known folklore that “90 percent 
of requests is for about 10 percent of the pages”. 

Figure 1 shows an example in which P]pj cr] G T> 2 . The y coordinate is 
P ]a; j ct] . Observe that the Fth element along the x-axis represents the l-th most 
recently requested page and thus the first k elements correpond to the set of 
recent pages. In the case of LRU these are also the pages in the cache of the 
algorithm. 

We next describe how the adversary generates request sequences. Assumed 
a prefix request sequence cr of length i < n has been generated, the adversary 
picks a distribution D{i) out of some family A and it chooses the next page to 
request (cr(i-|- 1)) according to D(i). Observe that, in general, D(ii) yf D{i 2 ) for 
ii yf i 2 - We define different adversaries, according to how D{i) is chosen: 

Definition 3. An adversary is concentrated (respectively concave and convex) 
if D{i) G G(t) (respectively T>i and V 2 ) for all i. If D(i) = U P 2 0 G(t) for 
some T and for all i we speak of a general adversary. 

Observe that the general adversary resumes all possible subcases. 
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Fig. 1. This picture represents the case in which P[p | cr] G T> 2 - 



2.3 Preliminary Results 

We partition each request sequence into phases in the usual way [3]. In particular, 
phases are defined inductively as follows: i) phase 1 starts with the first request 
and completes right before the {k + l)-th distinct request has been presented; 
ii) for i > 1, assuming phase i — 1, phase i starts with the {k + l)-th distinct 
request after the start of phase i—1 and it completes right before the (2fc+ l)-th 
distinct request after the start of phase i—1. We point out that the last phase of 
a request sequence might contain less than k distinct requests. However, it has 
to contain at least one. In the sequel, F is the random variable that denotes the 
overall number of phases. 

We say that a page is marked in a phase, if it was already requested during the 
phase. Otherwise the page is said unmarked. A marking algorithm never evicts 
a marked page when a page fault occurs. Observe that both LRU and FWF are 
marking algorithms. As such, they both achieve the best possible competitve 
ratio k for on-line paging algorithms [17]. 

Considered any phase ^ > 1, we say that a page Pj is new in i if it was not 
requested during phase £—1. Otherwise we say that pj is old. Also, considered any 
point of £, any of the k most recently requested pages is said recent. Considered 
any prefix a, we denote by A/”(cr) and TZ{cr) respectively the set of new and recent 
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pages after request subsequence a has been presented. Observe that both N{(j) 
and 1Z{a) do not depend on the paging algorithm. We say that page pj is critical, 
if Pj is old but it is not currently in cache. Observe that, considered any prefix a, 
the sets of in cache and critical pages depend on the particular paging algorithm 
under consideration. In the sequel, for any paging algorithm A, we denote by 
Ca{<j) and Oa(o’) respectively the sets of in cache and critical pages. Observe 
that Cyi((T) 7 ^ in general. Also, we write C(cr) for Ca(ct) and 0{a) for Oa(ct) 
whenever A is clear from context. 

and are random variables whose values are respectively the number 
of requests to critical and new pages during phase 1. = 0 if no £-th 

phase exists. A request in phase i is said distinct if it specifies a page that was 
not previously requested during the phase. Note that requests for new pages are 
distinct by definition. We denote by the number of distinct requests during 
phase £. Observe that, differently from and D^, depends on the paging 
algorithm. However, in the rest of this paper the algorithm is always understood 
from context, hence we do not specify it for these variables. Finally, considered 
any random variable V defined over all possible request sequences of length n, 
we denote by Va- its deterministic value for a particular request sequence a. The 
following result is due to Young [22] : 

Theorem 1 ([22]). For every request sequenee a: 



In the sequel, we assume that the first request of the first phase is for a new 
page. We can assume this without loss of generality, since a sequence of requests 
for recent pages does not change the cache state of LRU and has cost 0. This 
allows us to state the following fact: 

Fact 1 The first request of every phase is for a new page. 

The following fact, whose proof is straightforward, gives a crucial property 
of LRU: 

Fact 2 = TZ{cr), for every a. 

Observe that Fact 2 in general does not hold for other (even marking) algorithms. 
Set £{n) = \n/k~\. The following fact will be useful in the sequel: 

Fact 3 F < £(n) deterministieally for request sequences of length n. 



when a is generated by any of the adversaries defined above. We can can express 
the competitive ratio of LRU as stated in the following lemma: 




3 Analysis of LRU 



The main goal of this section is to bound 



E[LRU(cr)] 
E[OPT(ct)] ’ 
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Lemma 1. 



nLRUja)] ^ E[1U^ \F>i]P[F>e] 

E[OPT(cr)] - Yfe^lHN^\F>i]P[F>i] 

We now introduce the following random variables, for all pairs where 

j = 1, . . . , k: i) Xj = 1 if there is an ^-th phase, there is a distinct j-th request in 
the £-th phase and this request is for a critical page, Xj = 0 otherwise (i.e. no £-th 
phase exists or the phase is not complete and it does not contain a j-th distinct 
request or it does, but this request is not for a critical page); ii) = 1 if the j—th 
distinct request in phase £ is to a new page, Yj = 0 otherwise; a consequence 
of the definitions above is that we have 'Ff = N^. 

Observe, again, that Yj does not depend on the paging algorithm considered, 
while Xj does. Again, the algorithm is always understood from context. 

The rest of our proof goes along the following lines: intuitively, in the general 
case we have that, as P[a;|(T] becomes concentrated (i.e. variance decreases), 
most requests in the phase are for pages in cache, with the exception of a small 
number that at some point rapidly decreases with variance. In the other cases, 
we are able to bound the expected number of requests for old pages with the 
expected number of requests for new pages. We point out that in all cases the 
proof crucially relies on Fact 2. 



3.1 Convex Adversaries 

We consider the case in which D(i) g T> 2 , for every z = 1, . . . , n. Intuitively, we 
shall prove that if the probability of requesting a critical page is more than twice 
the probability of requesting a new page, then the former has to be sufficiently 
small in absolute terms. For the rest of this subsection, Nj is a random variable 
that denotes the number of requests for new pages that are presented before the 
j-th distinct request of phase i. In particular, = 0 if no .^-th phase exists 
while Nj = if the £-th phase is not complete and it does not contain a j-th 
distinct request. We now state the following lemma: 

Lemma 2. If the request sequence is generated by a convex adversary and M > 
4/c, 



P[Aj = l\F>e] < 2P[y/ = l\F>£] + \F>i]. 

Sketch of proof. Observe that the claim trivially holds if j = 1, since no critical 
pages exist in a phase before the first new page is requested. For j > 2 we can 
write: 



P[x| = l\F>e] =J2 = l\F>i(l N^ = x] P[iV| = x\F>i]. 

x—1 
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The assumption of convexity implies that, if P[-^j = 1 1 ^ H Nj = x\ > 

2P[Yj = 1\F > I Nj = x] , then the former is bounded by As a 

consequence: 

1 6t^ 

p[a| = 1 1 a > £ n = x] < 2P[r/ = 1 1 f > £ n iv| = x] + 

K 

We continue with: 



P[A| = 1 1 F > £] = = 1 1 F > £ n iVf = x] P[N^ = x | F > £] 

x—1 

J 

< 2P[y/ = 1 1 F > £] + ^ xP[iV/ = X I F > £] 

X—1 

< 2P[F/ = 1 1 F > £] + \F>e], 

rv 

where the second inequality follows since x < j — 1, while the third inequality 
follows since by definition. 

We can now write: 

E[W^ |F>£] <2E[iV^|F>£]+ ^E[fV^ |F>£] V(j - 1) <10E[A^^ | F>£] , 

fC 

i=i 



where the second inequality follows from simple manipulations. Recalling Equa- 
tion (1) we can finally state the following Theorem: 

Theorem 2. If the request sequence is generated hy a convex adversary, 

B[LRU{a)] 

E[OPF(cr)] - 

Note that Theorem 2 does not necessarily hold for other marking algorithms, 
since the proof of Lemma 2 crucially uses Fact 2. 

3.2 Concentrated Adversary 

In this subsection we consider all adversaries such that D(i) G G{t) for every 
i = 1, . . . , n. We can state the following lemma: 

Lemma 3. If the request sequence is generated hy a concentrated adversary then, 
for j > 2, 



P[X^ = 1 u F/ = 1 1 F > £] < 



2t 

T 



2 
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Lemma 3 and Fact 1 imply E \ F > = E \ F ^ ^ <{k — 

l)(2r/fc)^. We can therefore conclude with: 

Theorem 3. If the request sequence is generated by a concentrated adversary, 



E[LRU{a)] ^ 8t 2 
E[OPT(cr)] - X 



2 . 



3.3 Concave Adversaries 

In this section we provide an analysis of LRU for the case of concave adversaries. 
We start with the following Lemma: 

Lemma 4. If the request sequence is generated by a concave adversary and M > 
4fc, 

p[x^j = i\F>£] < 2P[y/ = i\F>e], 

Corollary 1. If the request sequence is generated by a concave adversary and 
M > 4k, 

E[W^\F> £\ ^ ^ 

EX |F > l\ - 

Corollary 1 and Equation (1) immediately imply the following Theorem: 

Theorem 4. If the request sequence is generated by a concave adversary and 
M > 4k, 



E[LRU{a)\ ^ 

E[OPT{a)] - ■ 

Theorem 4 does not necessarily hold for other marking algorithms, since its 
proof crucially uses Fact 2. Observe, also, the interesting circumstance that this 
adversary includes the uniform distribution as a special case. For this particular 
distribution, the results of Young [20,22] imply a competitive ratio 12(1) as M = 
k + o{k), that rapidly becomes 17 (log k) as M = fc + 0(1). 



3.4 General Adversaries 

If D{i) € G{t) U22i U222 for every i, the following is a corollary of the results of 
Subsections 3.1, 3.2, and 3.3: 

Theorem 5. If the request sequence is generated by a general adversary. 



E[LRU{a)] 

E[OPT{a)] 



< max < 22, 
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4 A Lower Bound for FWF 



We prove that under very general distributions, the performance of FWF de- 
grades to f2 (/c)-competitiveness as locality increases, while the opposite occurs 
for LRU. This behaviour is in line with practical experience (see e.g. [18,3]). To- 
gether with the results of Section 3, this seems to us one theoretical validation of 
the great difference observed between the behaviour of FWF and other marking 
algorithms (in particuar LRU). We prove the following 

Theorem 6. Assume 0 < a < 1 and assume request sequences of length n > 
2{k -I- 1)/q! are generated by a diffuse adversary that, for every i < n and for 
every a such that jcrj = * — 1, satisfies the following constraint 

P[U(t) G '^(c) I r{a) = a] = 1 — a. 



Then, 



E[FWF(cr)] 

E[OPT(cr)] 

E[LRC/(cr)] 

E[OPT(cr)j 



k 

^ 2{{k-l)a+l)' 
< 2ak + 2. 



Observe that a measures the amount of locality, i.e. locality increases as a be- 
comes smaller. This theorem thus states that the long term competitive ratio 
of FWF can be as bad as Q{k) as locality increases, whereas under the same 
circumstances the competitive ratio of LRU tends to become constant. As an 
example, in the case of a general adversary Theorem 5 implies that this occurs 
as the standard deviation becomes 0{\/k). 
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Abstract. The process of reading a DNA molecule can be divided into 
three phases: sequencing, assembly and finishing. The draft sequence af- 
ter assembly has gaps with no coverage and regions of poor coverage. 
Finishing by walks is a process by which the DNA in these regions is re- 
sequenced. Selective sequencing is expensive and time consuming. There- 
fore, the laboratory process of finishing is modeled as an optimization 
problem aimed at minimizing laboratory cost. We give an algorithm that 
solves this problem optimally and runs in worst case O(n^i) time. 



1 Introduction 

The aggregate DNA in the human genome is about 3 billion base pairs long. 
In the course of sequencing, a DNA sample is replicated a number of times 
and clipped by restriction enzymes into smaller DNA molecules. The position 
at which a DNA molecule is cut depends on the restriction enzyme used and 
the target site to which it binds. These shorter DNA molecules are fed into 
sequencers which read off about 500 base pairs from one side of each molecule. 
Each 500 base pair length sequence read by a sequencer is called a read. In 
shotgun sequencing, the sequencer can read from both ends. The more the reads 
the lesser are the unsequenced portions of the genome. Typically, a laboratory 
will resort to 6 fold coverage. This means that in this example the sequencers 
will read a total of about 18 billion base pairs. If we assume that each read 
of 500 base pairs is selected uniformly and independentally at random from 
the DNA, then on average, 6% of the DNA is read less than 3 times. These 
regions of DNA are considered to be poorly covered. Sequencing machines are 
not a hundred percent accurate and occasionally will misread a base, skip a 
base or insert a nonexistent base. In order to guarantee some level of accuracy, 
it is common to demand that each base pair in the sequenced DNA be read 
some number of times. This gives some assurance as to the quality of data. 
The laboratory has to selectively resequence those parts of the genome which 
have poor coverage. Selectively targeting parts of DNA is an expensive and time 
consuming process. The selected region that must be re-read, is identified by a 
primer in the neighborhood of the target. This region is then amplified and re- 
read. This process of targeted sequencing aimed at improving coverage is called 

* Supported by NSF Grant CCR-0311321. 
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finishing. The bottom line is that the more the coverage the less the likelihood of 
error. There are various ways to benchmark the quality of sequenced DNA. One 
such measure is called “Bermuda Quality” [Hgp98]. Bermuda quality addresses 
the fact that DNA is double stranded. The two strands of DNA are referred 
to as the Watson and the Crick strands. The two strands are complements 
of each other. Essentially, Bermuda quality requires that every base pair must 
be read at least twice on one strand and at least once on the complementary 
strand. There is a tradeoff between cheap random sequencing versus targeted 
sequencing [CKM+00]. Random sequencing has diminishing returns and each 
targeted resequencing has a high unit cost. The tradeoff is made by selecting a 
coverage ratio, such as the 6 fold coverage in the example above. 

Constructing the primer is the most expensive step in such a laboratory 
procedure. Resequenced reads are called walks as opposed to the untargeted ones 
which are called reads. After sequencing and assembly, we are left with regions 
of DNA which do not meet the Bermuda quality requirement. The laboratory 
must add walks to finish the sequence. Each walk requires the construction of 
a primer. Once a primer is constructed for some site in the genome, it may 
be reused for a second walk at the same site on the same strand. This second 
walk has a lower laboratory cost because the same primer is reused. This was 
introduced by [CKM+00] in modeling this problem. 

Naturally this problem has been posed as an optimization problem. The goal 
is to minimize the cost of resequencing needed. Programs that perform such 
optimization are described in [GAG98,GDG01]. They do not bound running 
times. This problem’s complexity was first studied by [PT99] who formulated it 
as an integer linear program. The cost model we study in this paper was shown 
to have an O(n^) dynamic programming solution in [CKM+00] . Hart [Har02] 
gave the first sub-cubic algorithm for this problem. [Har02] also discusses a 
randomized model in which their algorithm runs in expected linear time. 

We develop an algorithm that has a worst case running time of Our 

algorithm also guarantees the expected 0{n) running time for the randomized 
model discussed in [Har02] without any modification. The problem is introduced 
in section 2 and the basic algorithm is developed in section 3. Further improve- 
ments are discussed in section 4. 

2 Model and Problem 

Let the problem interval be [0, n) C IR where n is any real number greater than 
1. A read is a unit length interval on the real number line. Each read, r, has a 
start position, p{r) G [0,n — 1), a direction, d{r) G {■( — , — ?►} and is associated 
with the unit real interval [p(r),p(r) + 1). Given a set of reads, S, the goal is to 
find a set of unit intervals called walks, W. Each walk, w G W, has a position, 
p{w) € [0, n), a direction, d{w) G {< — , — >■, ^=, and is associated with the 
unit real interval [p{w),p{w) + 1). 

Any point in the problem interval is covered by a walk or read if it lies inside 
the interval corresponding to the walk or read. A point covered by a walk with 
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direction i — is the same as if it were covered by a read with direction < — . 
Similarly a walk with direction — > is the same as a read with direction — >. A 
point covered by a walk with direction is the same as if it were covered by 
two reads of direction < — . Similarly a walk with direction is same as two 
reads with direction — >. A point is adequately eovered by the reads and walks 
if and only if it lies inside at least 2 unit intervals with direction — > and at 
least 1 unit interval with direction < — or it lies inside at least 2 intervals with 
direction — and 1 interval with direction — >. Each walk has an associated 
cost, denoted: c(-). c ( — >) = c(< — ) = 1 and c(4=) = c(^^) = 1 + e, where 
0 < e < 1 and e is a rational number. The cost of a set of walks, W, is equal to 
the sum of the costs of the walks in W. In addition, a set of reads, S, and a set 
of walks, W, adequately cower the problem interval iff every point in the problem 
interval is adequately covered. 

In the above formulation the use of the real line is just a convenience and 
can easily be replaced with integers. The set S corresponds to the set of reads 
obtained from the first round of sequencing. For simplicity, each read is assumed 
to be of unit length. The set of walks in the solution correspond to the required 
resequencing. The interval is adequately covered when the union of reads and 
walks meet Bermuda quality. Walks of the type and are the equivalent of 
performing two walks on the same strand and at the same site. This is reflected 
in the lab cost of performing such walks: 1 + e. Two ^ — ’s cost 2 while one 
costs less than 2. 

Finishing: Given a problem instance, S, And a set of walks W with 

minimum cost such that [0, n) is adequately covered. 



Watson 

cityscape 






bn 



DNA 



Crick 

cityscape 



0 








Fig. 1. The two “cityscapes”. 



Figure 1 is an example of a problem instance. Each arrow denotes a unit 
length interval. Some reads are on the Watson strand and some are on the Crick 
strand. The bold parts of the center line highlight the regions of DNA that are 
adequately covered (of Bermuda quality) by the given reads. Czabarka et al gave 
the term cityscape to describe the coverage on one strand as a function that takes 
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values 0, 1 or 2. The cityscape above (resp. below) the center line specifies the 
coverage on the Watson strand (resp. Crick strand). If a base on a particular 
strand is read more than twice, then this gives us no new information. It suffices 
to say that the cityscape is at a value of 2 at such a point. The coverage at a 
point is adequate when the sum of the two cityscapes is at least 3. 

3 Algorithm Development 

We start by introducing some definitions and notation. There may be many 
different solutions with the same cost. If we are given any optimum solution, we 
can shift each walk to the right for as long as this shifting does not violate the 
coverage requirement. Since the set of walks is the same, the solution remains 
at optimum. This is called the shifted optimum. 

A feasible partial solution at x, is a set of walks, W, such that its walks are 
shifted, and every point in the interval [0,a;) has adequate coverage and some 
point in [x,x+l) has inadequate coverage. The set Wx is the set of all feasible 
partial solutions at x. A chain in a feasible partial solution is a maximal sequence 
of walks {wi,W 2 , . • ■ , Wk) stacked end to end, such that p{wi+i) = p{wi) + 1, for 

1 = l,...,/c — 1. Keep in mind that a walk could be a part of more than one chain. 

Figure 2 shows an optimum set of walks for the problem instance described in 
figure I. Note that with the walks the sequence meets Bermuda quality. Figure 

2 also shows two non-trivial chains and illustrates how two walks at the same 
point and the same strand reduce the overall cost. The cost of the walks in figure 
2 is 7 + 4e. 



Watson 

reads 



Crick 

reads 



walks 



Fig. 2. A shifted optimum. 



3.1 The Problem Instance 

A problem instance, S, is a set of reads in a problem interval of length n. The 
reads may be concentrated heavily in some part of the problem interval while 
there may be other parts that have no reads at all. A problem in which the 
read distribution varies wildly can be reduced to a new problem instance with 
a relatively even distribution of reads (with a bound on the read density) while 
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guaranteeing that the solution for the reduced instance can be used to recover 
a solution for the original problem in linear time. 

Lemma 1. Given any problem instance, it can he reduced to one in which: 

(i) there are no more than 8 reads that start ^ in any unit interval, and 

(ii) there is no stretch of length greater than 2 without a read. 

Proof (i). Assume that some unit interval has more than 4 reads that start within 
it and have the same direction. According to the requirements for adequate 
coverage there need be only 2 reads with like direction over any given point. So 
all reads that start after the first two but before the last two can be dropped. 
Similarly there are no more than 4 reads in the other direction. □ 

Proof (ii). Assume that there is a read-less stretch, (p(ri) -|- l,p(r 2 )), such that 
p{f 2 ) — p(^'i) > 3, for ri,T 2 G S. Consider the shifted optimum solution Wqpt 
for this instance. Let B be the set of walks that start in the interval I\ = 
[p(ri)-|-l,p(ri)-|-2). All other walks in this interval must belong to the same chain 
as one of the walks in B (if not then Wo pt is not shifted) . Since B adequately 
covers some point in Ii we can be assured of the fact that until interval I 2 = 
[p{ri) + \p{r 2 ) — p(ri)] — 2,p{ri) + \p{r 2 ) —p{ri)~\ — 1), each interval is covered by 
a set of walks that costs no more than c{B) (if not then this is not OPT). Note 
that c{B) is the lowest possible cost required to cover a unit length read-less 
interval. So we can remove the interval [p{ri) + l,p(ri) -|- \p{r 2 ) — p(ri) \ — 1) 
from the solution, there by creating a new problem instance without the long 
read-less stretch and a cost of c{Wqpt) — ([^(^ 2 ) — p(^i)l — 2) • c{B). 

Assume that this solution is not the optimum for this instance, and let some 
W' is the optimum for this problem instance. Now using the same construction 
as above, but done in reverse, find W” which is a solution set to instance S. 
Note that c{W”) = c{W') + {\p{r 2 ) — p(?’i)l — 2) • c{B). This will mean that 
c{W”) < c{Wopt)- Hence the lemma holds. □ 

From here on we will assume that every problem instance has been stripped 
of unnecessary reads and has no read-less stretches of length more than 2 in 
accordance with lemma 1. 



3.2 Characterization of the Optimum Solution 

Lemma 2. For any given problem, S, the shifted optimum, Wqpt, satisfies the 
following properties: 

(i) Wqpt has at most 3 walks that start at positions within any unit interval, 

(ii) Wqpt has at most 3 walks that cross any point, 

(Hi) Every chain in Wqpt starts at the end of a read or at 0 , and 

^ Any reference to the beginning or start of a read or walk refers to its position, p(-), 
the direction has no relevance to where it starts. For example, a read [a, a-|- 1) starts 
at a even if its direction is < — . 
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(iv) If in WoPT, 3 walks end inside a read then one of the first two must be 
the last walk of its chain. 

Proof (i). Consider some interval [x,x + 1). Let Wi,W2, ■ ■ ■ ,Wk G Wqpt such 
that X < p{wi) < p{w2) < • ■ ■ < p{wk) < a; + 1. Note that Wqpt is a shifted 
optimum, so no walk can be shifted right without compromising coverage. For 
the sake of contradiction let fc > 3. If d(wi) = d(w2) = d(ws) then W3 can be 
shifted out of this interval. If any of the first 3 walks differ in direction then the 
interval has adequate coverage and hence W4 can be shifted out of the interval. 

□ 

Proof (ii). Consider any point x € [0,n]. Let w S Wqpt be such that x € 
[p{w) , p{w) + 1) and w is the walk with the lowest such p{w). From lemma 2 (i) 
at most 3 walks can start in the interval [p(w),p{w) + 1). Hence there are no 
more that 3 walks that cross x. □ 

Proof (Hi). Let w be the first walk in the given chain. Since this is a shifted 
optimum, shifting w to the right would have resulted in a point with inadequate 
coverage. Provided this point is not at 0, there exists a point just before this that 
does have adequate coverage. This means that either a read or a walk terminated 
at this point. Since the chain is maximal by definition, this cannot be a walk. 
So it must be a read. □ 

Proof (iv). Let w\, W2 and W3 be the three walks that end inside read r € Sn 
in a shifted optimum in ascending order of position. Assume for the sake of 
contradiction that some W4,W5 € W satisfy p{w4) = p{w\) + 1 and p{w3) = 
p{w2) + 1 . 

If p{wa) < p{w3) < p{w3) + I then in the interval {p{wi),p{w5)), walks 
W3, W4 and read r, together provide adequate coverage. If this were not so, Wi 
serves no purpose and can be shifted out of {p{w4),p{w3)). Meanwhile in interval 
{p{w5),p{w3) + 1), W4, W3 and r must provide adequate coverage, implying that 
W5 should be shifted out. 

If p{w4) = p{w3) < p{w3) + I then over the interval {p{w4),p{w3) + 1), 
irrespective of what direction r and W3 have, either W4 or must be shifted 
right. 

If p{w4) = p{w3) = p{w3) + I. We know from lemma 2 (i) that there cannot 
be a third walk that starts in this interval. Since rci, W2 and W3 start at the 
same point, we can let W3 be first. □ 

3.3 A Partial Solution 

Let L be the length of the longest chain in any of the feasible partial solutions. 
Note that L < n. The parameter L is useful in considering other models, such 
as the one in which reads in the problem instance are randomly distributed. In 
such cases the longest possible chain is of constant length with high probability. 

Lemma 3. Let Wx be the set of feasible partial solutions at x € [0,n). Then, 
Wx satisfies the following properties: 
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(i) if Wi, W 2 e Wx then \c{Wi) - c{W 2 )\ <2 + e, 

(ii) if e is a constant rational and yVx is partitioned into sets of equal cost 

then there are 0(1) such sets, and 

(Hi) \Wx \ = 0{L). 

Proof (i). Let P be a set of walks such that Wi U P is an optimum solution for 
the given problem. Let the cost of the optimum solution be OPT. In addition 
let c{W\) = c{W 2 ) + where Z\ is a positive number. We know that W 2 covers 
every point before x adequately and P covers every point after a; + 1 adequately. 
In addition, a and — > are all that are needed to cover [a;, a:+ 1) adequately. 
Therefore, c(W2) + 2 + e + c(P) > OPT. Substituting for c(W2), we get, c{Wi) + 
c{P) + 2 + e— A > OPT. We know that c{Wi) + c{P) = OPT. This implies that 
Z\ < 2 + e. □ 

Proof (ii). Let e = such that p, g € IN, gcd(p, q) = 1. Consider any Wi,W 2 € 
Wx such that c{W\) < c(W 2 ) < c{W\) + There exist x,y € Z such that 
c{W 2 ) — c{W\) = x + ye. This implies that 0 < x + y^ < So, 0 < xq + yp < 1. 
But xq + yp is an integer. Contradiction. 

Due to lemma 3(i) the solution with the largest cost and the solution with 
the least cost in Wx differ by no more than 2 + e. Hence, there can be only 
(2 + e)q = 0(1) distinct equivalence classes of equal cost in Wx- □ 

Proof (Hi). Consider any W € Wx- We know, from lemma 2(H), that in W there 
may be no more that 3 walks that cross point x. Let us consider 3 chains C\, C 2 
and O3 that correspond to the walks that cross x. In addition let jOil < IO2I < 
IO3I. If there aren’t sufficient chains, let the smallest chains be empty chains, 
with p{Ci) = X and |Oi| = 0, for example. 

We claim that |Oi| < 4. As a consequence of lemma 1(H) there must be a full 
read in the interval [a: — 4,x). Lemma 2(iv) states that all three chains cannot 
go beyond this read. Hence, |Oi| < 4. Lemma l(i) states that there are no more 
than 8 • 4 = 32 reads in [x — 4, x). Lemma 2 (Hi) implies that Oi can possibly 
start at at most 32 different positions in [x — 4, x). In other words there are no 
more than 32 different positions in \x,x + 1) where Ci can end. Let us call this 
set of positions Pi. |Pi| < 32. 

Now look at the set of possible endpoints of the chain C3. Lemmas 2 (Hi) and 
l(i) imply that C 3 may start at no more than 8L different positions and therefore 
end at no more than 8P different positions. Let us call this set of positions P3. 
IP3I < 8P. 

There are 3 possible directions that the last walk in any chain can take and 
a 4th possibility that that chain does not exist. Let us call the set of all different 
possible directions of the last reads of any W by the set Q. |Q| < 4^ = 64. 

From lemma 3(H) we know that there is a constant sized set of possible costs 
for any solution in Wx- Let us call this set of costs K. \K\ = 0(1). 

We can therefore classify any W € Wx by a set of equivalence classes. The set 
of classes being T = Pi xP^xQxK. |T| = 0{L). Next we show that there are 
0 or 1 nonredundant solutions in each class. We define the characteristic, tx{-) 
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of a feasible partial solution at x as the equivalence class to which it belongs, 
i.e. t^{W) G T. 

Consider two partial feasible solutions W\,W2 G Wx, such that tx(Wi) = 
tx{W 2 )- The only thing that the two solutions differ in, is the end position of 
chain C 2 . Let u be the end point of chain C 2 in Wi and let v be the end point 
of C2 in W2- Wlog, let u > v. Say, that W2, goes on to form an optimum 
solution. Simply take every walk that was added to W 2 after x and place it in 
Wi- Clearly, Wi must also form an optimum. So, W 2 is expendable. We say that 
Wi dominates W 2 . Hence |>Va;| = 0{L). □ 

3.4 The Algorithm 

We introduce some additional notation before presenting the algorithm. As de- 
fined in lemma S(iii), any partial feasible solution W G Wx has a characteristic, 
tx{W). The characteristic is a tuple which specifies the directions of the reads 
that cross x, the cost c{W), end position of the longest chain that crosses x and 
the end position of the shortest chain that crosses x. If there are only one or two 
chains that cross x then the end position of the shortest chain is taken to be x. If 
there are no chains that cross x then both shortest and longest end positions are 
set to a;. If ITi, IT 2 G Wx, Wi yf W 2 , tx{W\) = tx{W2) then by the construction 
in lemma 3 (Hi), one of the solutions Wi or W 2 is expendable. So, if we say that 
Wi dominates W 2 then W 2 can be discarded. 

If W G Wx, then there is some point in [x,x + I) at which coverage is 
inadequate. We define augment{W) to be the set of all possible ways of adding 
walks to W such that the interval [x,x + 1) has adequate coverage. Note that 
all walks added are always flush with some walk or a read. Clearly, there can be 
no more than 3^ = 27 different ways of doing this. We now present algorithm 
Finish in figure 3. 

Theorem 1. The algorithm Finish(S') finds a set of walks with least cost in 
0{nL) time. 

Proof. We first show that Finish finds a solution with optimum cost, by an 
induction on x. We want to show that if Bx contains a solution which is a subset of 
an optimum then Bx+i contains a solution which is a subset of an optimum. Say 
that Bx contains a solution W that is a subset of some optimum. We know from 
lemma 3(i) that there can be no other W' G Bx such that c(W) > c{W') -I- 2 -|- e. 
If W is dominated by some other solution then even though W ^ Wx, there is 
some solution W" G Wx which dominates W and hence it must be the case that 
W" can form an optimum solution. As long as there is a feasible solution that is 
a subset of some optimum in Wx one of its augmented solutions must also be a 
subset of some optimum solution in Bx+i. In the base case Wo = {</'}• An empty 
solution is a subset of any optimum solution. Hence, Finish finds a solution with 
optimum cost. 

Next we show that the running time is 0{nL). The outermost loop runs for 
0{n) rounds. In any round, x, the first for loop runs in \Bx \ time. The second for 
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Algorithm Finish(S) 

1 : 

2 : for a; •<— 0 to n — 1 

3: Bx+i ■<— {</>} 

4: c* <— min {c(W)} 

weBx 

5: for every W € Bx 

6: ifc(VF) >2 + e + c* thenZS, 

7: end for 

8: for every Wi, W 2 G Bx such that tx{Wi) = tx{W 2 ) 

9: if Wi dominates W 2 then Bx ■(— Bx\ {W 2 } 

10: end for 

11: Wa; ■<— Bx 

12 : for every W G Wx 

13: Bx +1 ■<— Bx+iO augment{W,x) 

14: end for 

15: end for 

16 : retnrn argmin{c(VK)} 
weB„ 



Fig. 3. Finish Algorithm 



loop, though it looks quadratic can be done in linear time. We can group every 
solution by its characteristic. Then for each characteristic keep the one with the 
longest middle chain. So, this loop takes no more than \Bx \ + c\L time for some 
constant ci (related to the number of equivalence classes) . In the third for loop 
augment adds no more than 27 new solutions for each solution in Wj,. Hence 
it takes 27|>Va;| time. So, the total time taken is X]”=o(2|:Sa,| + 27|Wa;| + CiL). 
Note that the sets Bx are constructed from elements of Wx-i- \Bx\ < 27|Wa;-i|. 
Total time is 0{nL) + 81|kVa;|. From lemma \yVx\ = 0{L). Hence 

Finish runs in worst case time 0{nL) □ 

Since the length L cannot take a value larger than n Finish runs in worst 
case 0{n^) time. It can be shown that if the problem instance is random and 
the number of reads is 0{n) then the length of the longest chain is constant 
with high probability. This implies a linear time algorithm. This randomized 
model is discussed in [Har02] and extends naturally to our algorithm without 
any modification. 

4 Divide and Conquer 

Finish can be speeded up by localizing the computations. We show that making 
the computation too fine grained or too coarse slows down the algorithm. This is 
an adaptation of the technique used by Hart [Har02] . We split the interval in the 
problem instance into m equally sized subintervals. We borrow terminology from 
[Har02] and call each of these divisions. There are [— ] divisions. So, the first 
division is the interval [0, m), the second division is [m, 2m) and so on. It follows 
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Algorithm Finish2 
1: Wo ^ {</>} 

2 : for j ^ 0 to 

3: for each divisional characteristic in Wmj select one W € Wmj 

4: P <r- the set of walks in W that cross mj. 

5: call Finish{Dj U P) 

6: let W be the set of feasible solutions returned by Finish 

7: reconstruct all possible feasible solutions from W (lemma 4), let these be 

+ ■ 

8: create Wm(j+i) from Bm{j+i) according to lemma 3 by discarding unnecessary 

solutions. 

9: end for 

10: end for 



Fig. 4. Algorithm Finish2 



from lemma 1 that there are 0{m) reads in each division. We index divisions 
with the subscript j. For the jth division, let Dj be the set of all reads that 
have a point in common with [mj, mj + m). Let fij be the fractional part of the 
real number p(ri) for each € Dj. The subscript i orders reads in ascending 
order of p{ri). This means that 0 < foj < fij < f 2 ,j • • • < 1. These fractional 
values define small intervals. Each of these small intervals is called a fraction. 
There are 0{m) fractions corresponding to any division. We can now partition 
the set of all feasible partial solutions Wmj into equivalence classes. 

Recall that Wx is the set of feasible partial solutions such that for each so- 
lution in the set every point before x, on the real line is adequately covered. 
Since we are at the beginning of the jth division we set x = mj. Wx is par- 
titioned into equivalence classes by the characteristic function tx(-)- Along the 
same lines we define a new characteristic function dx(-) that partitions Wx into 
equivalence classes with respect to the jth division, we call this the divisional 
characteristic. Intuitively two feasible partial solutions belonging to the same 
divisional characteristic can be treated as one solution for the entire division. 
Consider any solution, W € Wx- At most 3 walks protrude beyond x in this 
solution. The fractional part of the positions of each of these 3 walks lies in one 
of the fractions of this division. Each solution can be categorized by which set of 
3 fractions its last three walks fall into. The solutions may be further categorized 
by the directions of the last three walks. These 6 parameters define the divisional 
characteristic for a feasible solution. From the proof of theorem 1 we know that 
the shortest of these three chains can end in at most 0(1) positions. Therefore, 
each partial solution can have one of 0{mf) possible divisional characteristics. 

Lemma 4. IfW\ and W 2 are two feasible partial solutions in Wmj, with iden- 
tical divisional characteristics, dmjiWi) = dmj{hV 2 ), and at the end of the di- 
vision, Wi and W 2 give rise to two sets of feasible partial solutions. Si and S 2 
respectively, then S 2 can be constructed from Si in linear time (linear in the size 
of Si. 
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Proof (Sketch). Let algorithm Finish run on the feasible partial solution, W\ 
over the division [mj, mj + m) . At the end of this we have a set of feasible 
solutions that cover this division adequately. Let this set be 5i. Similarly we 
get a set of solutions S 2 after running Finish on W 2 - Let Si be a solution in 
iSi- We describe how to constuct a feasible partial solution S 2 € S 2 from si. All 
chains that start at reads in si are left as is. Any chain that starts at a walk 
that enters the division, is shifted so that it starts at its corresponding walk 
in W 2 (there is a correspondence because Wi and W 2 have the same divisional 
characteristic). Two chains that start anywhere in the same fraction remain in 
the same fraction. There is no change in read coverage within a fraction (only at 
the border of a fraction) . Since the end point of each walk has the same coverage 
in both solutions the algorithm must make the same decision in both cases (albeit 
shifted by a small amount). Therefore, S 2 is a feasible partial solution in S 2 which 
is obtained from si by a slight shift of the first few chains. 

The procedure described takes a solution in 5i and creates one that belongs 
to ^ 2 . Note that if we ran the procedure on S 2 we would get back si. Therefore, 
this procedure is a bijective mapping from 5i to 1 S 2 . This means that we can 
reconstruct every solution in S 2 from a solution in 5i and vice versa. The time 
needed is the size of the solution in the worst case. □ 

Theorem 2. Algorithm Finish2 finds an optimum solution in 0{n^i) time. 

Proof. The correctness of Finish2 can be established with an induction argument 
analogous to the one used for Finish in conjunction with lemma 4. 

There are a total of — divisions. On entering each division, there are a set 
of 0{n) feasible partial solutions. This set of feasibles can be partitioned into 
O(m^) equivalence classes. We run Finish for one member of each equivalence 
class. This takes a total of 0{mf) time. At the end of a division we have a 
total of not more than 0{m?) feasible solutions {0{m) solutions from each of 
0{m?) runs). It takes linear time to reconstruct a global feasible solution from 
each of these, and since there are a total of 0{n) global feasibles at the end of 
the division the total time needed is 0{n + m'^). Summing the time needed for 
entering a division, processing it and leaving it gives us 0(n + m^) per division. 
If we set m = ni we get the fastest time to process a division. There are O(^) 
divisions in all. Hence the total running time is O(n^J). □ 

[ As a side note we mention that the sub-cubic algorithm in [Har02] appears 
to have a running time of O(n^t) instead of the claimed 0(n^3) bound. The 
analysis is similar to the one used in theorem 2. In their case, there is an O(n^) 
finish algorithm. The number of chains that any feasible partial solution ends 
with is at most 0(nf). With these two facts in mind we redo the above analysis. 
There are ^ divisions. On entering each division there are O(n^) feasible par- 
tial solutions. This set is partitioned into 0(mf) equivalence classes. The finish 
algorithm takes 0{m^) time on each of these cases. Hence, the total time spent 
on entry is 0{rrr'). At the end of the division, there are 0{mf) feasible solutions 
{0{mf) solutions from each of 0{m?) runs). Updating each solution takes 0{m) 
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time. There may be only O(n^) solutions at the end of the division. So, the net 
time taken to finish a division is 0{m^ + n^). Setting m = ns minimizes this 
time. There are ^ divisions, so the total running time should be 0(n^5). The 
algorithm is still sub-cubic. ] 

5 Conclusion 

We have described a new algorithm for finding the optimum set of walks to 
finish a DNA sequence with a worst case running time of 0{n^*). It is the first 
sub-quadratic algorithm for this problem. The algorithm can be made online in 
the sense that the reads appear in a left to right order and knowledge of n is not 
necessary. 

We show that the number of feasible solutions at any point on the problem 
interval is no more than 0{n). This is an improvement over the known O(n^). 
We believe that this could be much smaller. Another direction for future work 
is in trying to reduce the 0{n) update time required between two consecutive 
intervals. This may be possible with the help of a data structure suited to this 
problem. The running time of the algorithm has a strong dependence on e. It is 
very likely that there is an 0(n^ polylog n) algorithm that is independent of e. 
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Abstract. A minimal blocker in a bipartite graph G is a minimal set 
of edges the removal of which leaves no perfect matching in G. We give 
a polynomial delay algorithm for finding all minimal blockers of a given 
bipartite graph. Equivalently, this gives a polynomial delay algorithm for 
listing the anti-vertices of the perfect matching polytope P{G) — {x £ 
K® \ Hx — e, a; > 0}, where H is the incidence matrix of G. We also 
give similar generation algorithms for other related problems, including 
d-factors in bipartite graphs, and perfect 2-matchings in general graphs. 



1 Introduction 

Let G = (A, B, E) be a bipartite graph with bipartition AU B and edge set E. 
For subsets A! G A and B' C B, denote by {A',B') the subgraph of G induced 
by these sets. A perfect matching in G is a subgraph in which every vertex in 
AU B has degree exactly one, or equivalently a subset of |A| = \B\ pairwise 
disjoint edges of G. Throughout the paper, we shall always assume that a graph 
with no vertices has a perfect matching. A blocker in G is a subset of edges 
X C E such that the graph G(A, B, E\X) does not have any perfect matching. 
A blocker X is minimal if, for every edge e € A, the set A \ |e} is not a blocker. 
Denote respectively by A4(G) and B(G) the families of perfect matchings and 
minimal blockers for G. Note that in any bipartite graph G, the set B{G) is the 
family of minimal transversals to the family Af(G), i.e. B{G) is the family of 
minimal edge sets containing an edge from every perfect matching in G. Clearly, 
the above definitions imply that if |A| yf \B\, then B{G) = {0}. 

The main problem we consider in this paper is the following. 

GEN{B{G)): Given a bipartite graph G, enumerate all minimal blockers of G. 

The analogous problem GEN(A4(G)) of enumerating perfect matchings for 
bipartite graphs was considered in [9,20]. 

* This research was supported by the National Science Foundation (Grant IIS- 
0118635). The third author is also grateful for the partial support by DIMACS, 
the National Science Foundation’s Center for Discrete Mathematics and Theoretical 
Computer Science. 
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In such generation problems the output size may be exponential in the input 
size, therefore the efficiency of an algorithm is evaluated by measuring the time 
it needs to find a next element, see e.g., [14]. More precisely, we say that the 
elements of a set S represented in some implicit way by the input I are generated 
with polynomial delay, if the generation algorithm produces all elements of S 
in some order such that the computational time between any two outputs is 
limited by a polynomial expression of the input size. The generation is said to 
be incrementally polynomial, if the time needed to find the k+ 1st element, after 
the generation of the first k elements, depends polynomially not only on the 
input size but also on k. 

The problems of generating perfect matchings and minimal blockers are 
special cases of the more general open problems of generating vertices and 
anti-vertices of a polytope given by its linear description. Indeed, let H be 
the vertex-edge incidence matrix of G = (A,B,E), and consider the polytope 
P(G) = {x a I Hx = e, X > 0}, where e € rMI+I^I is the vector of all 
ones. Then the perfect matchings of G are in one-to-one correspondence with 
the vertices of P{G), which in turn, are in one-to-one correspondence with the 
minimal collections of columns of H containing the vector e in their conic hull. 
On the other hand, the minimal blockers of G are in one-to-one correspondence 
with the anti- vertices of P{G), i.e. the maximal collections of columns of H, 
not containing e in their conic hull. Both corresponding generating problems are 
open in general, see [2] for details. In the special case, when H is the incidence 
matrix of a bipartite graph G, the problem of generating the vertices of P{G) 
can be solved with polynomial delay [9,20]. In this note, we obtain an analogous 
result for anti- vertices of P{G). 

Theorem 1. Problem GEN{B{G)) can he solved with polynomial delay. 

For non-bipartite graphs G, it is well-know that the vertices of P{G) are 
half-integral [16] (i.e. the components of each vertex are in {0, 1, 1/2}), and that 
they correspond to the basic 2-matchings of G, i.e. subsets of edges that cover 
the vertices with vertex-disjoint edges and vertex-disjoint odd cycles. Polyno- 
mial delay algorithms exist for listing the vertices of 0/1-polytopes, simple and 
simplicial polytopes [1,5], but the status of the problem for general polytopes 
is still open. We show here that for the special case of the polytope P{G) of a 
graph G, the problem can be solved in incremental polynomial time. 

Theorem 2. All basic perfect 2-matchings of a graph G can be generated in 
incremental polynomial time. 

The proof of Theorem 1 is based on a nice characterization of minimal block- 
ers, which may be of independent interest. The method used for enumeration is 
the supergraph method which will be outlined in Section 3.1. A special case of 
this method, when applicable, can provide enumeration algorithms with stronger 
properties, such as enumeration in lexicographic ordering, or enumeration with 
polynomial space. This special case will be described in Section 3.2 and will be 
used in Section 5 to solve two other related problems of enumerating d-factors 
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in bipartite graphs, and enumerating perfect 2-matchings in general graphs. The 
latter result will be used to prove Theorem 2. 

2 Properties of Minimal Blockers 

2.1 The Neighborhood Function 

Let G = {A, B, E) be a bipartite graph. For a subset X C A U i?, denote by 
r{X) = Ec{X) = G (A U B) \ X | {v,x} & E for some x € X} the set of 
neighbors of X. It is known (see, e.g., [15]), and also easy to verify, that set- 
function |T(-)| is submodular, i.e. |T(X U F)| -I- \E{X fl F)| < |T(X)| -|- |T(F)| 
holds for all subsets X, F C A U B. A simple but useful observation about 
submodular set-functions is the following (see, e.g., [17]): 

Proposition 1. Let V he a finite set and / : 2^ i— ^ K &e a submodular set- 
function. Then the family of sets X minimizing /(X) forms a sub-lattice of the 
Boolean lattice 2^. If /(X) is polynomial-time computable for every X Q V , 
then the unique minimum and maximum of this sub-lattice can he computed in 
polynomial time. 

2.2 Matchable Graphs 

Matchable graphs, a special class of bipartite graphs with a positive surplus (see 
Section 1.3 in [16] for definitions and properties), were introduced in [12]. They 
play an important role in the characterization of minimal blockers. 

Definition 1 ([12]). A bipartite graph G = (A, B, E) with | A] = \B\ -|- 1 is said 
to be matchable if the graph G\v has a perfect matching for every vertex v € A. 

The smallest matchable graph is one in which |A| = 1 and \B\ = 0. The 
following is a useful equivalent characterization of matchable graphs. 

Proposition 2 ([12]). For a bipartite graph G = (A,B,E) with |A| = |i?| -|- 1, 
the following statements are equivalent: 

(Ml) G is matchable; 

(M2) G has a matchable spanning tree, that is, a spanning tree T in which every 
node V £ B has degree 2. 

It can be shown that for bipartite graphs G = {A,B,E) with |A| = [B] -|- 1 
these properties are further equivalent with 

(M3) G contains a unique independent set of size |A|, namely the set A. 

Given a bipartite graph G, let us say that a matchable subgraph of G is max- 
imal, if it is not properly contained in any other matchable subgraph of G. The 
following property can be derived from a result by Dulmage and Mendelsohn, or 
more generally from the Gallai-Edmonds Structure Theorem (see Section 3.2 in 
[16] for details and references). 
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Proposition 3 . Let G = (A,B,E) be a matchable graph with |A| = \B\ + 1 , 
and let a £ A and b £ B he given vertices. Then the graph G' = G — {a, b} is 
the union of a maximal matchable subgraph G\ = {Ai^Bf) of G' and a (possibly 
empty) subgraph G2 = {A2, B2) which has a perfect matching, such that the 
graph (Ai,i?2) is empty. Furthermore, this decomposition of G' into Gi and G2 
is unique, and can be found in polynomial time. 

The next proposition acts, in a sense, as the opposite to Proposition 3 , and 
can be derived in a similar way: Given a decomposition of a matchable graph 
into a matchable subgraph and a subgraph with a perfect matching, we can 
reconstruct in a unique way the original matchable graph. 

Proposition 4 . Let G = (A,B,E) be a bipartite graph and G\ = {Ai,Bi) and 
G2 = (^21.82) be two vertex-disjoint subgraphs of G such that G\ is matchable, 
G2 has a perfect matching, and A = AiU A2 and B = B\\J B2. Then there is a 
unique extension of G\ into a maximal matchable subgraph G'^ = (A(,B[) of G 

def 

such that the graph G'2 = {A\ U A2 \ A'^, B\\J B2 \ B'^) has a perfect matching. 
Such an extension is the unique possible decomposition of G into a maximal 
matchable graph plus a graph with a perfect matching, and can be computed in 
polynomial time. 

As a simple application of Proposition 4 , we obtain the following corollary. 

Corollary 1 . Let G = {A,B,E) be a bipartite graph and G' = {A',B') be a 
matchable subgraph ofG. Given two vertices a € A\A' and 6 € B\B' such that 
there is an edge {b,a'} for some a' € A' , there exists a unique decomposition of 
G' = {A' U {a}, B' U {&}) into a maximal matchable graph plus a graph with a 
perfect matching. 

The next property, which can be derived analogously to Proposition 3 , implies 
that matchable graphs are “continuous” in the sense that we can reach from 
a given matchable graph to any matchable graph containing it by repeatedly 
appending pairs of vertices, one at a time, always remaining within the class of 
matchable graphs. 

Propositions. Let T = (A,B,E) and T' = {A',B',E') be two matchable 
spanning trees such that \A\ = \B\ + 1 , \A'\ = \B'\ -i- 1 , A C A' and B C B' . 
Then (i) there exists a vertex b € B' \ B such that |Tt'({&}) H A| > 1 , (ii) 
for any such vertex b, there exists a vertex a G A' \ A, such that the graph 

r\e>i 

G[a,b] = (AU {a}, B U {b}, E U E') is matchable. 

The above properties readily imply the following corollary. 

Corollary 2 . Let G = (A,B,E) and G' = {A',B',E') be two matchable graphs 
such that |A| = \B\ + 1 , \A'\ = \B'\ + 1 , A C A' , and B C B' . Then there exists 
a pair of vertices a G A' \ A and b G B' \ B such that the graph (A U {a}, B U 
{6}, E U E') is matchable. 

To arrive to a useful characterization of minimal blockers, we need one more 
definition. 
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Definition 2. Let G = (A, B, E) be a bipartite graph having a perfect matching. 
A matchable split of G, denoted by [(^i, Bi), (^ 2 , ^ 2 )) (^37 S 3 )], is a partition 
of the vertex set of G into sets A\, A2, A^ C A and Bi, B2, B3 C B such that 

(Bl) |Ai| = jSil + 1, 1 ^ 2 ! = IS 2 I - 1, IA 3 I = IS 3 I, 

(B2) the graph G\ = {Ai^Bi) is matchable, 

(B3) the graph G2 = (^27^2) is matchable, 

(B4) the graph G3 = {A^,B^) has a perfect matching, and 
(B5) the graphs {Ai,B^) and {A^,B2) are empty. 

Denote by S{G) the family of all matchable splits of G. We are now ready 
to state our characterization for minimal blockers in bipartite graphs. 

Lemma 1. Let G = {A, B, E) be a bipartite graph in which there exists a perfect 
matching. Then a subset of edges X Q E is a minimal blocker if and only 
if there is a matchable split [{Ai, Bi), {A2, B2), {A^, B^)] of G, such that X = 
{{a, 6} G E : a G Ai, b G B2} is the set of all edges between Ai and B2 in 
G. This correspondence between minimal blockers and matchable splits is one- 
to-one. Given one representation we can compute the other in polynomial time. 

3 Enumeration Algorithms 

Given a (bipartite) graph G = (V,E), consider the problem of listing all sub- 
graphs of G, or correspondingly, the family C 2 ^ of all minimal subsets of 
E, satisfying some monotone property tt : 2^ 1— >■ {0, 1}. For instance, if tt{X) is 
the property that the subgraph with edge set X C E has a perfect matching, 
then is the family of perfect matchings of G. Enumeration algorithms for 
listing subgraphs satisfying a number of monotone properties are well known. 
For instance, it is known [19] that the problems of listing all minimal cuts or 
all spanning trees of an undirected graph G = (V, E) can be solved with delay 
0 {\E\) per generated cut or spanning tree. It is also known (see e.g., [7,10,18]) 
that all minimal (s, t)-cuts or (s, t)-paths, can be listed with delay 0 {\E\) per cut 
or path. Furthermore, polynomial delay algorithms also exist for listing perfect 
matchings, maximal matchings, maximum matchings in bipartite graphs, and 
maximal matchings in general graphs, see e.g. [9,20,21]. 

In the next subsections we give an overview of two commonly used techniques 
for solving such enumeration problems. 

3.1 The Supergraph Approach 

This technique works by building and traversing a directed graph Q = 
defined somehow on the set of elements of the family to be generated ^ 2^. 
The arcs of Q are defined by a polynomial-time computable neighborhood function 
Af : Ejr 1-^ 2 ^" that defines, for any X G Tt^ the set of its outgoing neighbors 
Af{X) in C/. A special vertex Xq G is identified from which all other vertices 
of Q are reachable. The method works by performing a (breadth- or depth-first 
search) traversal on the nodes of Q, starting from Xq. The following is a basic 
fact about this approach: 
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(SI) If G is strongly connected and |A/’(X)| < p(|7r|) for every X € where 
p(|7t|) is a polynomial that depends only on the size of the description of 
7T, then the supergraph approach gives us a polynomial delay algorithm for 
enumerating starting from an arbitrary set Xq G 

Let us consider as an example the generation of minima of submodular func- 
tions: Let / : 2^^ I— >■ K be a submodular function, a = min{/(X) : X C V}, 
and 7t(X) be the property that X Q V contains a minimizer of /. By Proposition 
1, the set forms a sub-lattice C of 2^^, and the smallest element Xq of this 
lattice can be computed in polynomial time. For X € X,r, define A/”(X) to be 
the set of subsets Y G C that immediately succeed X in the lattice order. The 
elements oiJ\f{X) can be computed by finding, for each v G P\X, the smallest 
cardinality minimizer X' = argmin{/(X) : X D X U {u}} and checking if 

f(X') = a. Then |A/”(X)| < |P|, for all X G Note also that the definition of 
A/”(-) implies that the supergraph is strongly connected. Thus (SI) implies that 
all the elements of can be enumerated with polynomial delay. 

3.2 The Flashlight Approach 

This can be regarded as an important special case of the supergraph method. 
Assume that we have fixed some order on the elements of E. Let Xq be the 
lexicographically smallest element in For any X G let X(X) consist 
of a single element, namely, the element next to X in lexicographic ordering. 
Thus the supergraph G in this case is a Hamiltonian path on the elements of 
(see [19] for general background on backtracking algorithms). The following is 
a sufficient condition for the operator Af{-) (and also for the element Xq) to be 
computable in polynomial time: 

(FI) For any two disjoint subsets Si, S 2 of E, we can check in polynomial time 
p(|7r|) if there is an element X g such that X D S'! and X fl 5*2 = 0. 

The traversal of G, in this case, can be organized in a backtracking tree of 
depth \E\, whose leaves are the elements of as follows. Each node of the tree 
is identified with two disjoint subsets Si, S 2 Q E, and have at most two children. 
At the root of the tree, we have Si = S 2 = ^- The left child of any node {Si, S 2 ) 
of the tree at level i is {Si U {z}, iS' 2 ) provided that there is an X g such that 
X 3 U {z} and X fl 5'2 = 0. The right child of any node (5*1, S' 2 ) of the tree 
at level z is (^i, S '2 U {z}) provided that there is an X g such that X D 
and X n {S 2 U {z}) = 0. Furthermore, we may restrict our attention to subsets 
Si satisfying certain properties. More precisely, let E'^ C 2^ be a family of sets, 
containing X.„., such that we can test in polynomial time if a set X belongs to 
it, and such that for every X G E-j^, there is a set S' g E'^ contained in X. Then 
the following is a weaker requirement than that of (FI): 

(F2) For any two disjoint subsets Si G E'^ and S 2 Q E, and given element 
i G E \ {Si U S 2 ) such that Si U {z} g E'{^, we can check in polynomial time 
p(|7r|) if there is an element X g Et^, such that X f) SiU{z} and XnS 2 = 0. 
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This way, under assumption (F2), we get a polynomial delay, polynomial 
space algorithm for enumerating the elements of T.„ in lexicographic order. 

As an example, let us consider the generation of maximal matchable sub- 
graphs: Let G = (A,B,E) be a bipartite graph. For subsets A' C- A and 

B' C B, let tt{A',B') be the property that the graph {A',B') contains a 
maximal matchable subgraph. To generate the family of maximal matchable 
graphs we introduce an artificial element *, and consider the ground set 
E' = {{a,b) : a G A, b G B U {*}}. Any X C E' represents an in- 
duced subgraph G{X) = (A' , B') of G where A' = {a : (a,b) € X} and 
B' = {b : (a,b) G X, b ^ *}. Furthermore, let us say that the elements of 
X C E' are disjoint if for every pair of elements {a,b), {a' ,b') G X, we have 
a ^ a' and b ^ b' . Let = {X C E' : the elements of X are disjoint, one 

of them contains *, and G{X) is matchable}. Given two disjoint sets of pairs 
<51, S'2 C E' , such that Si G and a pair (a, b) G E' \ (Si U ^2), the check in 
(F2) can be performed in polynomial time. Indeed, by Corollary 2, all what we 
need to check is that the graph on U {a, b} is matchable. 

4 Polynomial Delay Generation of Minimal Blockers 

In this section, we use the supergraph approach to show that minimal blockers 
can be enumerated with polynomial delay. Using Lemma 1, we may equivalently 
consider the generation of matchable splits. We start by defining the neighbor- 
hood function used for constructing the supergraph Qg of matchable splits. 

4.1 The Supergraph 

Let G = (A, B, E) be a bipartite graph that has a perfect matching. Given a 
matchable split X = \G\ = (Ai,Ui),G2 = (A2,U2),G3 = (43,53)] G S{G), 
the set of out-going neighbors of X in Qs are defined as follows. For each edge 
{a, &} G E connecting a vertex a S A2 to a vertex b G B2, such that b is also 
connected to some vertex a' G Ai, there is a unique neighbor X' = J\f{X,a,b) 
of X, obtained by the following procedure: 

1. Let G" = (Ai U {a}, Bi U {6}). Because of the edges {a, b}, {o', b} G E, the 
graph G" is matchable, see Gorollary 1. 

2. Delete the vertices a, b from G2. This splits the graph G2 — {a, b} in a unique 
way into a matchable graph G'2 = (A2, 5^) and a graph with a perfect matching 
G'f = (A2 , B'f) such that there are no edges in E between A2 and 5^, see 
Proposition 3. 

3. Let Gg = G3 U G' 2 '. Then the graph G'f has a perfect matching. Using 
Proposition 4, extend G'/ in the union G” U Gg = (Ag, B'f) into a maximal 
matchable graph G[ = (A}, 5}) such that the graph Gg = (Ag,^^) = (Ag \ 
A'g, 5g' \ B'l) has a perfect matching. 

4. Let X' = [(A'g,5(), (A^,^^), (Ag,^^)], and note that X' G S(G). 

Similarly, for each edge {a, b} G E connecting a vertex a € Ag to a vertex 

b G Bi, such that a is also connected to some vertex 6' G B2, there is a unique 
neighbor X' of X, obtained by a procedure similar to the above. 
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4.2 Strong Connectivity 

For the purpose of generating minimal blockers in a bipartite graph G = 
(A, B, E), we may assume without loss of generality that 

(Al) The graph G is connected. 

(A2) Every edge in G appears in a perfect matching. 

It is easy to see that both properties can be checked in polynomial time (see 
elementary bipartite graphs in [16]). Indeed, if G has a number of connected 
components, then the set of minimal blockers of G is the union of the sets of 
minimal blockers in the different components, computed individually for each 
component. Furthermore, all the edges in G that do not appear in any perfect 
matching can be removed (in polynomial time), since they do not appear in any 
minimal blocker. In this section, we prove the following. 

Lemma 2. Under the assumptions (Al ) and (A2), the supergraph Qs is strongly 
connected. 

Clearly Lemma 2 implies Theorem 1, according to (SI). Call the set of edges 
connected to a given vertex v € AU B a, v-star, and it by denote by v*. We start 
with the following lemma. 

Lemma 3. (i) Under the assumption (Al) and (A2), every star is a min- 
imal blocker, (ii) The matchable split corresponding to an a-star, a £ A, is 
[({a}, 0 ), (A \ {a}, B), ( 0 , 0 )] . (Hi) The matchable split corresponding to a b-star, 
b€B,ts [(A,B\{ 6 }),( 0 ,{&}),( 0 , 0 )]. 

Given two matchable splits X = [(Ai, i?i), (A2, B2), (A3, B3)] and X' = 
[(A'^,i?(), (A2,i?2), (^3i^3)]i let us say that X and X' are perfectly nested if 

(Nl) Al C A'l and Si C 
(N2) A2 A'2 and B2 A B'2, and 

(N3) each of the graphs (A2nA'i, S2nS(), (A3nA'i, S3nS(), (A2nA3, S2nS3), 
and (A3 n Ag, S3 fl S3), has a perfect matching. 

The strong connectivity of Qs is a consequence of the following fact. 

Lemma 4. There is a path in Qs between any pair of perfectly nested matchable 
splits. 

Proof Let A= [(Ai, Si), (A2, S2), (A3, S3)], X'= [(A(, S(), (A' , S' ), (A'g, S')] 
be two matchable splits, satisfying (Nl), (N2) and (N3) above. We Show that 
there is a path in Qs from X to X' . X path in the opposite direction can be found 
similarly. Fix a matching M in the graph (A'l fl A2,S( fl S2). By Proposition 
5, there is a vertex b G B[\ Si, such that b is connected by an edge {a, b} to 
some vertex a € Ai. Note that b ^ B3 C\ B[ since the graph (Ai, S3) is empty. 
Thus b G B2 r\ B[. Let a' G A( fl A2 be the vertex matched by M to b. Now 
consider the neighbor X” = Af(A, a',6) = [(A'/, S"), (A2, S2), (Ag, S3')] of X 
in Qs. We claim that X” is perfectly nested with respect to X' . To see this 
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observe, by Proposition 2, that the graph (A2, -82) — {a^ b} is decomposed, in a 
unique way, into a matchable graph G'2 = (A'2, B'2) and a graph with a perfect 
matching G'2 = On the other hand, the graph B'2) is matchable, 

while the graph {A2 0 A'^ \ {a'}, B2 O \ {6}) U {A2 0 ^3, B2 O Bg) has a perfect 
matching, and therefore by Proposition 3, the union of these two graphs (which 
is (^2,-82) — {a',b}) decomposes in a unique way into a matchable graph 
containing (^21^2) ^ graph with a perfect matching 82- The uniqueness 

of the decomposition implies that Fi = G'2 and 82 = G'2 , i.e. A'^ 3 J^2 
B'2 A b'2- Note that the graph {A'^ O A'^,B'^ D -8() still has a perfect matching. 
Note also that the graph {A'{, B'() is obtained by extending the matchable graph 
(^1 U {a'}, Bi U {5}) with pairs from the graph (A™, B'^') U {A'^ fl A3, B'^ fl B3), 
which has a perfect matching. Such an extension must stay within the graph 
(A'i,B'i), i.e. A'{ C A'l and B'{ C B'(, since there are no edges between A'^ and 
B'2 n i?g. Finally, since the extension leaves a perfect matching in the graph 
(A'2" \ A'l, B'2 \ B'O U (A( n A3 \ A'l, n i?3 \ B'l), we conclude that X" and X' 
are perfectly nested. This way, we obtained a neighbor X" of X that is closer to 
X' in the sense that |A'/| > |Ai|. This implies that there is a path in Qs from X 
to X' of length at most |A( \ Ai|. □ 

Proof of Lemma 2. Let X = [(Ai, Bi), (A2, S2), (Ag, Bg)] be a matchable 
split. Let a be any vertex in A\, then the matchability of A\ implies that the 
graph (Ai \ {a},Bi) has a perfect matching. Thus, with respect to a* and X, 
the conditions (N1)-(N3) hold, implying that they are perfectly nested. Lemma 

4 hence implies that there are paths in Qs from a* to X and from X to a*. 

Similarly we can argue that there are paths in Qs between X and b* for any 
6 € i?2- In particular, there are paths in Qs between a* and b* for any a € A 
and b G B. The lemma follows. □ 

5 Some Generalizations and Related Problems 

5.1 d- Factors 

Let G = (A, B, E) be a bipartite graph, and d : A U B !->■ {0, 1 . . . , |A| + |i?|} be 
a non-negative function assigning integer weights to the vertices of G. We shall 
assume in what follows that, for each vertex v G AVJ B, the degree of u in G is 
at least d{v). A d-factor of G is a subgraph (A, B, X) of G in which each vertex 
V G AVJ B has degree d{v) (see Section 10 in [16]), i.e., perfect matchings are 
the 1-factors. Note that d-factors of G are in one-to-one correspondence with 
the vertices of the polytope P{G) = {x G | Hx = d, 0 < a; < 1}, where H 
is the incidence matrix of G. In particular, checking if a bipartite graph has a 
d-factor can be done in polynomial time by solving a capacitated transportation 
problem with edge capacities equal to 1, see, e.g., [6]. 

Since the d-factors are the vertices of a 0/1-polytope, they can be computed 
with polynomial delay [5]. We present below a more efficient procedure. 

Theorem 3. For any given bipartite graph G = (A, B, E), and any non-negative 
integer function d : A U S 1— >■ {0, 1 . . . , |A| -|- jBj}, after computing the first d- 
factor, all other d-factors of G can be enumerated with delay 0(|81|). 
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Proof. We use the flashlight method. Let Md{G) be the set of d-factors of G. 
It is enough to check that condition (FI) is satisfied. Given 81,82 C E, we can 
check in polynomial time whether there is an X € Ad ^(G) such that X 81 and 
X n S '2 = 0 by checking the existence of a d'-factor in the graph {A,B,E\82), 
where d'{v) = d{v) — 1 ii v G AU B incident to some edge in 81 and d'(n) = d{v) 
for all other vertices v G AU B. 

A more efficient procedure can be obtained as a straightforward generaliza- 
tion of the one in [9]. It is a slight modification of the flashlight method that 
avoids checking the existence of a d- factor at each node in the backtracking tree. 
Given a d-factor X in G, it is possible to And another one, if it exists, by the 
following procedure. Orient all the edges in X from A to B and all the other 
edges in E \ X from B to A. Then it is not difficult to see that the resulting 
directed graph G' has a directed cycle if and only if G contains another d-factor 
X' G A4d(G). Such an X' can be found by finding a directed cycle G in G', and 
taking the symmetric difference between the edges of G and X. 

Now, the backtracking algorithm proceeds as follows. Each node w of the 
backtracking tree is identified with a weight function dw, a du,-factor X„,, and 
a bipartite graph G^, = {A, B, E^). At the root r of the tree, we compute a 
d-factor X in G, and let d^ = d and G^ = G. At any node w of the tree, we 
check if there is another d^j-factor X!^ using the above procedure, and if there 
is none, we define ui to be a leaf. Otherwise, we let be an arbitrary edge in 
Xw \ X[^. This defines the two children u and z of w by: d„(u) = dw{v) — 1 for 
V G and d„(u) = dw{v) for all other vertices v G Afi B, X„ = X^, G^ = Gyj, 
dz = dw, Xz = Xf,, and Anally, Gz = Gw — e^,. The d- factors computed at the 
nodes of the backtracking tree are distinct and they form the complete set of 
d-factors of G. □ 



5.2 2-Matchings in General Graphs 

Let G = {V, E) be an undirected graph with vertex set V and edge set E. Let id 
be the incidence matrix of G. As stated in the introduction, if G is a bipartite 
graph then the perfect matchings of G correspond to the vertices of the polytope 
P(G) = {x gR^ I Hx = e, X > 0}. In general graphs the vertices of P{G) are 
in one-to-one correspondence with the basic perfect 2-matchings of G, i.e. subsets 
of edges that form a cover of the vertices with vertex-disjoint edges and vertex- 
disjoint odd cycles (see [16]). A (not necessarily basic) perfect 2-matching of G 
is a subset of edges that covers the vertices of G with vertex-disjoint edges and 
vertex-disjoint (even or odd) cycles. Denote respectively by M2{G) and M'2{G) 
the families of perfect 2-matchings and basic perfect 2-matchings of a graph 
G. We show below that, the family Ad 2 (G) can be enumerated with polynomial 
delay, and the family Xi^fG) can be enumerated in incremental polynomial time. 
Theorem 2 will follow from the following two lemmas. 

Lemma 5. All perfect 2-matchings of a graph G can be generated with polyno- 
mial delay. 
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Proof. We use the flashlight method with a slight modification. For X C E, 
let tt{X) be the property that the graph {V^X) has a perfect 2-matching. Then 
Ttt = M. 2 {G). Define = {X C E : the graph (V,X) is a vertex-disjoint 

union of some cycles, some disjoint edges, and possibly one path which maybe 
of length one}. Clearly, any X € M 2 {G) contains some set X' G Given 
Si C S 2 C E, we modify the basic approach described in Section 3.2 in two 
ways. First, when we consider a new edge e G E \ {Si U S 2 ) to be added to 
we first try an edge incident to the path in Si, if one exists. If no such path 
exits, then any edge e G E \ {Si U S' 2 ) can be chosen and defined to be a path 
of length one in Si U {ej. Second, when we backtrack, for the first time, on an 
edge e defining a path of length one in Si, we redefine Si by considering e as a 
single edge rather than a path of length one. Now it remains to check that (F2) 
is satisfied. Given Si C .7^^, S 2 C E, and an edge e G E \ {Si U S 2 ), chosen as 
above, such that Si U {ej G T'.^, we can check in polynomial time whether there 
is an X G M. 2 {G) such that X D U {ej and X fl 52 = 0 in the following 
way. First, we delete all edges in S 2 from G. Second, we construct an auxiliary 
bipartite graph G^ as follows (see [16]). For every vertex u of G we define two 
vertices v' and v" in G^, and for every edge {u,v} in G we define two edges 
{u'v"} and {u”,v'} in G^. Third, we orient all the edges in Si U {ej in such a 
way that all cycles and the path (if it exists) become directed. Single edges in 
5i U {ej get double orientations. Finally, we mark edges in G^ corresponding 
to the oriented arcs in X: for an arc {u,v) in X, we mark the corresponding 
edge {u',v''} in G^. It is easy to see that there is an X G M 2 {G) such that 
X D 5i U {ej and X fl ^2 = 0 if and only if there is a perfect matching in G^ 
extending the marked edges. □ 

Lemma 6. Let M. 2 {G) and A4'2{G) be respectively the families of perfect 2- 
matchings and basic perfect 2-matchings of a graph G = {V, E). Then 



Proof of Theorem 2. By generating all perfect 2-matchings of G and dis- 
carding the non-basic ones, we can get generate all basic perfect 2-matchings. 
By Lemma 6, the total time for this generation is polynomial in \V\, \E\, and 
\M' 2 {G)\. Since the problem is self-reducible, we can convert this output polyno- 
mial generation algorithm to an incremental one, see [3,4,8] for more details. □ 
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Abstract. Direct routing is the special case of bufferless routing where 
N packets, once injected into the network, must be routed along specific 
paths to their destinations without conflicts. We give a general treatment 
of three facets of direct routing: 

(i) Algorithms. We present a polynomial time greedy algorithm for arbi- 
trary direct routing problems which is worst-case optimal, i.e., there 
exist instances for which no direct routing algorithm is better than 
the greedy. We apply variants of this algorithm to commonly used 
network topologies. In particular, we obtain near-optimal routing 
time for the tree and d-dimensional mesh, given arbitrary sources 
and destinations; for the butterfly and the hypercube, the same result 
holds for random destinations. 

(ii) Complexity. By a reduction from Vertex Coloring, we show that 
Direct Routing is inapproximable, unless P=NP. 

(iii) Lower Bounds for Buffering. We show that certain direct routing 
problems cannot be solved efficiently; to solve these problems, any 
routing algorithm needs buffers. We give non-trivial lower bounds 
on such buffering requirements for general routing algorithms. 



1 Introduction 

Direct routing is the special case of bufferless routing where N packets are routed 
along specific paths from source to destination so that they do not conflict with 
each other, i.e., once injected, the packets proceed “directly” to their destination 
without being delayed (buffered) at intermediate nodes. Since direct routing 
is bufferless, packets spend the minimum possible time in the network given 
the paths they must follow - this is appealing in power/resource constrained 
environments (for example optical networks or sensor networks). From the point 
of view of quality of service, it is often desirable to provide a guarantee on the 
delivery time after injection, for example in streaming applications like audio 
and video. Direct routing can provide such guarantees. 

The task of a direct routing algorithm is to compute the injection times 
of the packets so as to minimize the routing time, which is the time at which 
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the last packet is delivered to its destination. Algorithms for direct routing are 
inherently offline in order to guarantee no conflicts. We give a general analysis of 
three aspects of direct routing, namely efficient algorithms for direct routing; the 
computational complexity of direct routing; and, the connection between direct 
routing and buffering. 

Algorithms for Direct Routing. We give a general and efficient (polynomial time) 
algorithm. We modify this algorithm for specific routing problems on commonly 
used network topologies to obtain more optimal routing times; thus, in many 
cases, efficient routing can be achieved without the use of buffers. 

Arbitrary: We give a simple greedy algorithm which considers packets in some 
order, assigning the first available injection time to each packet. This algo- 
rithm is worst-case optimal: there exist instances of direct routing for which 
no direct routing algorithm achieves better routing time. 

Tree: For arbitrary sources and destinations on arbitrary trees, we give a direct 
routing algorithm with routing time rt = 0{rt*) where rt* is the minimum 
possible routing time achievable by any routing algorithm (direct or not). 

d-Dimensional Mesh: For arbitrary sources and destinations on a d-dimensional 
mesh with n nodes, we give a direct routing algorithm with routing time 
rt = 0{df ■ log^ n ■ rt*) with high probability (w.h.p.). 

Butterfly and Hypercuhe: For permutation and random destination problems 
with one packet per node, we obtain routing time rt = 0{rt*) w.h.p. for the 
butterfly with n inputs and the n-node hypercube. 

Computational Complexity of Direct Routing. By a reduction from vertex color- 
ing, we show that direct routing is NP-complete. The reduction is gap-preserving, 
so direct routing is as hard to approximate as coloring. 

Lower Bounds for Buffering. There exist direct routing problems which can- 
not be efflcienty solved; such problems require buffering. We show that for any 
buffered algorithm there exist routing instances, for which packets are buffered 
times in order to achieve near-optimal routing time. 

Next, we discuss related work, followed by preliminaries and main results. 



Related Work 

The only previous known work on direct routing is for permutation problems on 
trees [1,2], where the authors obtain routing time 0(n) for any tree with n nodes, 
which is worst-case optimal. Our algorithm for trees is every-case optimal for 
arbitrary routing problems. Cypher et al. [3] study an online version of direct 
routing in which a worm (packet of length L) can be re-transmitted if it is 
dropped (they also allow the links to have bandwidth B > 1). For the case 
corresponding to our work {L = B = 1), they give an algorithm with routing 
time 0{{C -\-logn) ■ D). We give an off line algorithm with routing time 0{C-D), 
show that this is worst case optimal, and that it is NP-hard to give a good 
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approximation to the optimal direct routing time. We also obtain near-optimal 
routing time (with respect to buffered routing) for many interesting networks, 
for example the Mesh. 

A dual to direct routing is time constrained routing where the task is to 
schedule as many packets as possible within a given time frame [4]. In these 
papers, the authors show that the time constrained version of the problem is 
NP-complete, and also study approximation algorithms on linear networks, trees 
and meshes. They also discuss how much buffering could help in this setting. 

Other models of bufferless routing in which packets do not follow specific 
paths are matching routing [5] and hot-potato routing [6,7,8]. In [6], the authors 
also study lower bounds for near-greedy hot-potato algorithms on the mesh. Op- 
timal routing for given paths on arbitrary networks has been studied extensively 
in the context of store-and-forward algorithms, [9,10,11]. 

2 Preliminaries 

Problem Definition. We are given a graph G = (V, E) with n > 1 nodes, and 
a set of packets U = Each packet Tr^ is to be routed from its source, 

s(7Ti) € E, to its destination, ^(7Ti) € V, along a pre-specified path pi. The 
nodes in the graph are synchronous: time is discrete and all nodes take steps 
simultaneously. At each time step, at most one packet can follow a link in each 
direction (thus, at most two packets can follow a link at the same time, one 
packet at each direction). 

A path p is a sequence of vertices p = (vi,V 2 , ■ ■ ■ ,Vk). Two paths pi and 
P 2 collide if they share an edge in the same direction. We also say that their 
respective packets tti and tt 2 collide. Two packets conflict if they are routed in 
such a way that they appear in the same node at the same time, and the next 
edge in their paths is the same. The length of a path p, denoted [p], is the number 
of edges in the path. For any edge e = {vi, vj) G p, let dp{e) denote the length of 
path (t)i, . . . , Vi, Vj). The distance between two nodes, is the length of a shortest 
path that connects the two nodes. 

A direct routing problem has the following components. Input: (G,II,V), 
where G is a graph, and the packets II = {T^i}f^i have respective paths 
V = {pi}!fLi. Output: The injection times T = denoted a routing sched- 

ule for the routing problem. Validity: If each packet tt^ is injected at its corre- 
sponding time Ti into its source s,, then it will follow a conflict-free path to its 
destination where it will be absorbed at time U = Ti -\- jpij. 

For a routing problem {G,II,V), the routing time rt(G,II,V) is the max- 
imum time at which a packet gets absorbed at its destination, rt{G, U, V) = 
maxi{ri -I- [pi]}. The offline time, ol{G,II,V) is the number of operations used 
to compute the routing schedule T. We measure the efficiency of a direct routing 
algorithm with respect to the congestion G (the maximum number of packets 
that use an edge) and the dilation D (the maximum length of any path) . A well 
known lower bound on the routing time of any routing algorithm (direct or not) 
is given by I2{G -\- D). 
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Dependency Graphs. Consider a routing problem (G,II,V). The dependency 
graph D of the routing problem is a graph in which each packet g 77 is a 
node. We will use to refer to the corresponding node in V. There is an edge 
between two packets in D if their paths collide. An edge (tTj, tt^-) with i < j in D 
has an associated set of weights Wij: w € Wij if and only if m and iTj collide 
on some edge e for which dp .(e) — dp.{e) = w. Thus, in a valid direct routing 
schedule with injection times Ti,Tj for TTi^iTj, it must be that tj — Ti ^ ^i,j- 
illustration of a direct routing problem and its corresponding dependency graph 
are shown in Figure 1. 




Wl,3 = { 2 } 



Wi,2 = {0,-2} 



© 



routing problem, (G, 71, V) dependency graph, V 

Fig. 1. An example direct routing problem and its dependency graph. 



We say that two packets are synchronized, if the packets are adjacent in D 
with some edge e and 0 is in the weight set of e. A clique /C in 77 is synchronized 
if all the packets in K, are synchronized, i.e., if 0 is in the weight set of every 
edge in /C. No pair in a synchronized clique can have the same injection time, 
as otherwise they would conflict. Thus, the size of the maximum synchronized 
clique in 77 gives a lower bound on the routing time: 

Lemma 1 (Lower Bound on Routing Time). Let 1C be a maximum syn- 
chronized clique in the dependency graph 77. Then, for any routing algorithm, 

rt{G,n,V) > |/C| 

We define the weight degree of an edge e in 77, denoted W{e), as the size of the 
edge’s weight set. We define the weight degree of a node tt in 77, denoted 
as the sum of the weight degrees of all edges incident with tt. We define the 
weight of the dependency graph, W(fD), as the sum of the weight degrees of all 
its edges, W{V) = J2eeE{v) W^(e). For the example in Figure 1, W{D) = 3. 

3 Algorithms for Direct Routing 

Here we consider algorithms for direct routing. The algorithms we consider are 
variations of the following greedy algorithm, which we apply to the tree, the 
mesh, the butterfly and the hypercube. 
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1: // Greedy direct routing algorithm: 

2: // Input: routing problem {G,II,V) with N packets 77 = {TTi}fLi. 

3: // Output: Set of injection times T = 

4: Let TTi, . . . , TTjv be any specific, arbitrarily chosen ordering of the packets. 

5: for i = 1 to N do 

6: Greedily assign the first available injection time to packet G 77, so 

that it does not conflict with any packet already assigned an injection 
time. 

7: end for 

The greedy direct routing algorithm is really a family of algorithms, one for 
each specific ordering of the packets. It is easy to show by induction, that no 
packet TTj conflicts with any packet with i < j, and thus the greedy algorithm 
produces a valid routing schedule. The routing time for the greedy algorithm 
will be denoted rtcr{G, 77, P). Consider the dependency graph T> for the routing 
problem {G,II,P). We can show that Ti < W{-Ki), where W{'Ki) is the weight 
degree of packet tt^, which implies: 

Lemma 2. rtor{G, U,P) < maxi{W(7Ti) + \pi\}. 

We now give an upper bound on the routing time of the greedy algorithm. Since 
the congestion is C and \pi\ < D V7, a packet collides with other packets at most 
{G — 1) ■ D times. Thus, W{TTi) < (C — 1) • 77, Vi. Therefore, using Lemma 2 we 
obtain: 

Theorem 1 (Greedy Routing Time Bound). rtcr{G,II,V) <G ■ D. 

The general 0{C ■ D) bound on the routing time of the greedy algorithm is 
worst-case optimal, within constant factors, since from Theorem 9, there exist 
worst-case routing problems with G{G ■ D) routing time. In the next sections, we 
will show how the greedy algorithm can do better for particular routing problems 
by using a more careful choice of the order in which packets are considered. 

Now we discuss the offline time of the greedy algorithm. Each time an edge 
on a packets path is used by some other packet, the greedy algorithm will need 
to desynchronize these packets if necessary. This will occur at most G ■ D times 
for a packet, hence, the offline computation time of the greedy algorithm is 
olcr{G, 77, V) = 0{N ■ G ■ D), which is polynomial. This bound is tight since, in 
the worst case, each packet may have G ■ D collisions with other packets. 

3.1 Trees 

Consider the routing problem {T,U,P), in which T is a tree with n nodes, 
and all the paths in P are shortest paths. Shortest paths are optimal on trees 
given sources and destinations because any paths must contain the shortest path. 
Thus, rt* = Q{G + 77), where rt* is the minimum routing time for the given 
sources and destinations using any routing algorithm. The greedy algorithm with 
a particular order in which the packets are considered gives an asymptotically 
optimal schedule. 
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Let r be an arbitrary node of T. Let di be the closest distance that tt's path 
comes to r. The direct routing algorithm can now be simply stated as the greedy 
algorithm with the packets considered in sorted order, according to the distance 
with di^ <di^ <■■■ < di^. 

Theorem 2. Let (T,II,V) be any routing problem on the tree T. Then the 
routing time of the greedy algorithm using the distance- ordered packets is 
rt{T,n,V) <2C + D-2. 

Proof. We show that every injection time satisfies Ti < 2C — 2. When packet tt^ 
with distance di is considered, let Vi be the closest node to r on its path. All 
packets that are already assigned times that could possibly conflict with are 
those that use the two edges in tt^’s path incident with Vi, hence there are at 
most 2C — 2 such packets. Since is assigned the smallest available injection 
time, it must therefore be assigned a time in [0, 2C — 2], 

3.2 d-Dimensional Mesh 

A d-dimensional mesh network M = M(mi, m 2 , ■ ■ ■ , m^) is a multi-dimensional 
grid of nodes with side length in dimension i. The number of nodes is 
n = nf=i define m = Every node is connected to up to 

2d of its neighbors on the grid. Theorem 1 implies that the greedy routing algo- 
rithm achieves asymptotically optimal worst case routing time in the mesh. We 
discuss some important special cases where the situation is considerably better. 
In particular, we give a variation of the greedy direct routing algorithm which 
is analyzed in terms of the number of times that the packet paths “bend” on 
the mesh. We then apply this algorithm to the 2-dimensional mesh in order to 
obtain optimal permutation routing, and the d-dimensional mesh, in order to 
obtain near-optimal routing, given arbitrary sources and destinations. 

Multi-bend Paths. Here, we give a variation of the greedy direct routing algo- 
rithm which we analyze in terms of the number of times a packet bends in the 
network. Consider a routing problem (G, 77, V). We first give an upper bound on 
the weight degree W{T>) of dependency graph T> in terms of bends of the paths. 
We then use the weight degree bound in order to obtain an upper bound on the 
routing time of the algorithm. 

For any subset of packets 77' C 77, let T>n' denote the subgraph of T> induced 
by the set of packets 77'. (Note that T> = 77 77 .) Consider the path p of a packet n. 
Let’s assume that p = (. . . ,Vi,v,vj, . . .), such that the edges (vi,v) and (v, vj) 
are in different dimensions. We say that the path of packet tt bends at node v, 
and that v is an internal bending node. We define the source and destination 
nodes of a packet tt to be external bending nodes. The segment p' = {vi, . . . ,Vj) 
of a path p, is a subpath of p in which only Vi and Vj are bending nodes. Consider 
two packets tti and 7 T 2 whose respective paths pi and P 2 collide at some edge e. 
Let p'l and p '2 be the two respective segments of p\ and p 2 which contain e. Let 
p' be the longest subpath of p'^ and p^ which is common to p'^ and p' 2 ] clearly 
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e is an edge in p' . Let’s assume that p' = {vi, . . . ,Vj). It must be that Vi is a 
bending node of one of the two packets, and the same is true of Vj. Further, 
none of the other nodes in p' are bending nodes of either of the two packets. We 
refer to such a path p' as a common subpath. Note there could be many common 
subpaths for the packets tti and tt 2 , if they meet multiple times on their paths. 

Since pi and p 2 collide on e, the edge h = (7ri,7r2) will be present in the 
dependency graph T> with some weight w G W 1.2 representing this collision. 
Weight w suffices to represent the collision of the two packets on the entire 
subpath p' . Therefore, a common subpath contributes at most one to the weight- 
number of T>. Let A-p denote the number of common subpaths. We have that 
W{V) < Ap. Therefore, in order to find an upper bound on W{V), we only 
need to find an upper bound on the number of common subpaths. 

For each common subpath, one of the packets must bend at the beginning 
and one at end nodes of the subpath. Thus, a packet contributes to the number 
of total subpaths only when it bends. Consider a packet tt which bends at a node 
V. Let ei and 62 be the two edges of the path of tt adjacent to v. On ei the packet 
may meet with at most C —1 other packets. Thus, ei contributes at most C —1 
to the number of common subpaths. Similarly, 62 contributes at most C — 1 
to the number of common subpaths. Thus, each internal bend contributes at 
most 2C — 2 to the number of common subpaths, and each external bend C — 1. 
Therefore, for the set of packets TT', where the maximum number of internal 
bends is b, Ap < 2{b+ 1)|T7'|(C— 1). Hence, it follows: 

Lemma 3. For any subset TT' C TT, W{T>n') < 2{b+ 1)\U'\{C — 1), where b is 
the maximum number of internal bending nodes of any path in TT'. 

Since the sum of the node weight degrees is 2W{T>), we have that the average 
node weight degree of the dependency graph for any subset of the packets is 
upper bounded by 4(6-1- 1)(C — 1). We say that a graph T> is K-amortized if the 
average weight degree for every subgraph is at most K. TF-amortized graphs are 
similar to balanced graphs [12]. Thus T> is 4(6-1- l)C-amortized. A generalized 
coloring of a graph with weights on each edge is a coloring in which the difference 
between the colors of adjacend nodes cannot equal a weight. TT- Amortized graphs 
admit generalized colorings with TT -|- 1 colors. This is the content of the next 
lemma. 

Lemma 4 (Efficient Coloring of Amortized Graphs). Let V be a K- 

amortized graph. Then T> has a valid K + 1 generalized coloring. 

A generalized coloring of the dependency graph gives a valid injection schedule 
with maximum injection time one less than the largest color, since with such an 
injection schedule no pair of packets is sycnhronized. Lemma 4 implies that the 
dependency graph T> has a valid 4(6-|-l)(C'— 1)-|-1 generalized coloring. Lemma 
4 essentially determines the order in which the greedy algorithm considers the 
packets so as to ensure the desired routing time. Hence, we get the following 
result: 

Theorem 3 (Multi-bend Direct Routing Time). Let {M,n,V) be a direct 
routing problem on a mesh M with congestion C and dilation D. Suppose that 
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each packet has at most b internal bends. Then there is a direct routing schedule 
with routing time rt < 4(b + 1)(C — 1) + D. 

Permutation Routing on the 2-Dimensional Mesh. Consider a x mesh. 
In a permutation routing problem every node is the source and destination of 
exaclty one packet. We solve permutation routing problems by using paths with 
one internal bend. Let e be a column edge in the up direction. Since at most y/n 
packet originate and have destination at each row, the congestion at each edge 
in the row is at most 0{y/n). Similarly for edges in rows. Applying Theorem 3, 
and the fact that D = 0(y/n), we then get that rt = 0(y/n), which is worst case 
optimal for permutation routing on the mesh. 

Near Optimal Direct Routing on the Mesh. Maggs et al. [13, Section 3] give a 
strategy to select paths in the mesh M for a routing problem with arbitrary 
sources and destinations. The congestion achieved by the paths is within a log- 
arithmic factor from the optimal, i.e., C = 0{dC* logn) w.h.p., where C* is the 
minimum congestion possible for the given sources and destinations. Following 
the construction in [13], it can be shown that the packet paths are constructed 
from O(logn) shortest paths between random nodes in the mesh. Hence, the 
number of bends b that a packet makes is 6 = O(dlogn), and D = O(mlogn), 
where m is the sum of the side lengths. We can thus use Theorem 3 to obtain a 
direct routing schedule with the following properties: 

Theorem 4. For any routing problem {M, II) with given sources and des- 
tinations, there exists a direct routing schedule with routing time rt = 
0{d'^C* log^ n-\-mlogn), w.h.p.. 

Let D* denote the maximum length of the shortest paths between sources and 
destinations for the packets in U. D* is the minimum possible dilation. Let 
rt* denote the optimal routing time (direct or not). For any set of paths, C -\- 
D = I2{C* -\- D*), and so the optimal routing time rt* is also I2{C* -\- D*). If 
D* = I7(m/(d^ log n)), then rt* = I2{C* -\-m/ {d'^logn)), so Theorem 4 implies: 

Corollary 1. If D* = I7(m/(c?^ logn)), then there exists a direct routing sched- 
ule with rt = 0{rt*d^log^n), w.h.p.. 



3.3 Butterfly and Hypercube 

We consider the n-input butterfly network B, where n = 2^, [14]. There is a 
unique path from an input node to an output node of length Ig n-l-1. Assume that 
every input node is the source of one packet and the destinations are randomly 
chosen. 

For packet tt,, we consider the Bernoulli random variables Xj which are one 
if packet tTj collides with tt^. Then the degree of tt^ in the dependency graph, 
Xi = J2j ^j- We show that E[Xi] = |(lgn— 1), and since the Xj are independent, 
we use the Chernoff bound to get a concentration result on Xi. Thus, we show 
that w.h.p, maxi Xi = O(lgn). Since the injection time assigned by the greedy 
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algorithm to any packet is at most its degree in the dependency graph, we show 
that the routing time of the greedy algorithm for a random destination problem 
on the butterfly satisfles P [rtoriB, U,V) < flgn] > 1 — 2y/2n~^. (the details 
are given in an appendix). 

Valiant [15,16] proposed permutation routing on butterfly-like networks by 
connecting two butterflies, with the outputs of one as the inputs to the other. 
The permutation is routed by first sending the packets to random destinations 
on the outputs of the first butterfly. This approach avoids hot-spots and converts 
the permutation problem to two random destinations problems. Thus, we can 
apply the result for random destinations twice to obtain the following theorem: 

Theorem 5. For permutation routing on the douhle-hutterfly with random 
intermediate destinations, the routing time of the greedy algorithm satisfies 
P [rtcr < 5 Ig n] > 1 — 4y/2n~ 3 . 

A similar analysis holds for the hypercube network (see appendix). 

Theorem 6. For a permutation routing using random intermediate destina- 
tions and bit-fixing paths, the routing time of the greedy algorithm satisfies 
P [rtor < 141gn] > 1 — l/(16n). 

4 Computational Complexity of Direct Routing 

In this section, we show that direct routing and approximate versions of it are 
NP-complete. First, we introduce the formal definition of the direct routing de- 
cision problem. In our reductions, we will use the wll known NP-complete prob- 
lem Vertex Color, the vertex coloring problem [17], which asks whether a 
given graph G is K-colorable. The chromatic number, x(C) is the smallest k for 
which G is K-colorable. An algorithm approximates x(G) with approximation ra- 
tio q{G) if on any input G, the algorithm outputs u{G) such that x(G) < u{G) 
and u(G)/x{G) < q{G). Typically, q{G) is expressed only as a function of the 
number of vertices in G. It is known [18] that unless P=NP*, there does not ex- 
ist a polynomial time algorithm to approximate x(G) with approximation ratio 
7 Vi/ 2 -e fQj. constant e > 0. 

By polynomially reducing coloring to direct routing, we will obtain hard- 
ness and inapproximability results for direct routing. We now formally define a 
generalization of the direct routing decision problem which allows for collisions. 
We say that an injection schedule is a valid iF-collision schedule if at most K 
collisions occur during the course of the routing (a collision is counted for every 
collision of every pair of packets on every edge). 

Problem: Approximate Direct Route 

Input: A direct routing problem (G, 7T, P) and integers T,K >0, 

Question: Does there exist a valid fc-collision direct routing schedule T for 
some k < K and with maximum injection time Tmax < T1 

* It is also known that if NPgZPP then x is inapproximable to within however 

we cannot use this result as it requires both upper and lower bounds. 
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The problem Direct Route is the restriction of Approximate Direct Route 
to instances where K = 0. Denoting the maximum injection time of a valid 
A-collision injection schedule by T, we define the K-collision injection num- 
ber n,V) for a direct routing problem as the minimum value of T for 

which a valid AT-collision schedule exists. We say that a schedule approximates 
tk{G, n,V) with ratio q if it is a schedule with at most K collisions and the 
maximum injection time for this schedule approximates r/y(G, 77, R) with ap- 
proximation ratio q. We now show that direct routing is NP-hard. 



Theorem 7 (Direct Route is NP-Hard). There exists a polynomial 
time reduction from any instance (G, k) o/ Vertex Color to an instance 
{G' , n,'P,T = K - 1) of Direct Route. 



Sketch of Proof. We will use the direct 
routing problem illustrated to the right, 
for which the dependency graph is a 
synchronized clique. Each path is as- 
sociated to a level, which denotes the 
x-coordinate at which the path moves 
vertically up after making its final left 
turn. There is a path for every level in 
[0, L] , and the total number of packets is 
N = The level-7 path for 7 > 0 be- 
gins at (1 — 7, 7—1) and ends at (7, 7^-l-7), 
and is constructed as follows. Beginning 
at (1 — 7, 7 — 1), the path moves right till 
(0,7 — 1), then alternating between up 
and right moves till it reaches level 7 at 
node (7, 27 — 1) (7 alternating up and 
right moves), at which point the path moves up to (i,L 




Fig. 2. A mesh routing problem. 



7). Every packet is 

synchronized with every other packet, and meets every other packet exactly once. 

Given an instance I = (G, K) of Vertex Color, we reduce it in polynomial 
time to an instance I' = {G' , U,V ,T = K — 1) of Direct Route. Each node 
in G corresponds to a packet in 77. The paths are initially as illustrated in the 
routing problem above with L = TV — 1. The transformed problem will have 
dependency graph T> that is isomorphic to G, thus a coloring of G will imply a 
schedule for T> and vice versa. 

If (u, v) is not an edge in G, then we remove that edge in the dependency 
graph by altering the path of u and v without affecting their relationship to any 
other paths; we do so via altering the edges of G' by making one of them to pass 
above the other, thus avoiding the conflict. After this construction, the resulting 
dependency graph is isomorphic to G. 



Direct Route is in NP, as one can check the validity and routing time of a 
direct routing schedule, by traversing every pair of packets, so Direct Route 
is NP complete. Further, we see that the reduction is gap preserving with gap 
preserving parameter p = 1 [19]. 
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Theorem 8 (Inapproximability of Collision Injection Number ). A poly- 
nomial time algorithm that approximates tk{G, U,V) with ratio r for an arbi- 
trary direct routing problem yields a polynomial time algorithm that approximates 
the chromatic number of an arbitrary graph with ratio r -\- K -\- 1. In particular, 
choosing K = 0(r) preserves the approximation ratio. 

For K = 0(7Vi/2-<=)^ since x is inapproximable with ratio 0 {N^G have 

Corollary 2 (Inapproximability of Scheduling). Unless P=NP, for K = 

there is no polynomial time algorithm to determine a valid K- 
collision direct routing schedule that approximates tk{G,II,V) with ratio 
/or any e > 0. 

5 Lower Bounds for Buffering 

Here we consider the buffering requirements of any routing algorithm. We con- 
struct a “hard” routing problem for which any direct routing algorithm has 
routing time rt = SI{G ■ D) = Q{C -\- D)"^, which is asymptotically worse than 
optimal. We then analyze the amount of buffering that would be required to 
attain near optimal routing time, which results in a lower bound on the amount 
of buffering needed by any store-and-forward algorithm. 

Theorem 9 (Hard Routing Problem). For every direct routing algorithm, 
there exist routing problems for which the routing time is I2{G-D) = fi{{G-\-D)'^) 

Proof. We construct a routing problem for which the dependency graph is a 
synchronized clique. The paths are as in Figure 2, and the description of the 
routing problem is in the proof of Theorem 7. The only difference is that c packets 
use each path. The congestion is C = 2c and the dilation is D = 3L. Since every 
pair of packets is synchronized, Lemma 1 implies that rt{G, II, V) > N . Since 
N = c{L -I- 1) = ^(y -I- 1), rt{G,n,V) = n{G ■ D). Choosing c = 0{'/N) and 
L = 0{y/N), we have that C -\- D = 0{y/N) so G -\- D = 0{\/G ■ D). 

Let problem A denote the routing problem in the proof of Theorem 9. We would 
like to determine how much buffering is necessary in order to decrease the routing 
time for routing problem A. Let T be the maximum injection time (so the routing 
time is bounded by T -|- D). We give a lower bound on the number of packets 
that need to be buffered at least once: 

Lemma 5. In routing problem A, if T < a, then at least N — a packets are 
buffered at least once. 

Sketch of Proof. If (3 packets are not buffered at all, then they form a synchro- 
nized clique, hence T > (3. Therefore a > (3, and since N — (3 packets are buffered 
at least once, the proof is complete. 

If the routing time is 0{C -\- D), then a = 0{C -\- D). Choosing c and L to 
be 0(iV^/^), we have that a = 0(iV^/^), and so from Lemma 5, the number of 
packets buffered is I2{N): 
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Corollary 3. There exists a routing problem for which any algorithm will buffer 
l7(iV) packets at least once to achieve asymptotically optimal routing time. 

Repeating problem A appropriately, we can strengthen this corrollary to obtain 

Theorem 10 (Buffering- Routing Time Tradeoff). There exists a routing 
problem which, for any e > 0, requires buffering in order to obtain 

a routing time that is a factor 0{N^) from optimal. 
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Abstract. It was shown recently by Fakcharoenphol et al. [7] that arbi- 
trary finite metrics can be embedded into distributions over tree metrics 
with distortion O(logn). It is also known that this bound is tight since 
there are expander graphs which cannot be embedded into distributions 
over trees with better than D(logn) distortion. 

We show that this same lower bound holds for embeddings into distribu- 
tions over any minor excluded family. Given a family of graphs F which 
excludes minor M where \M\ = k, we explicitly construct a family of 
graphs with treewidth-(fc -f 1) which cannot be embedded into a dis- 
tribution over F with better than 17(logn) distortion. Thus, while these 
minor excluded families of graphs are more expressive than trees, they do 
not provide asymptotically better approximations in general. An impor- 
tant corollary of this is that graphs of treewidth-fc cannot be embedded 
into distributions over graphs of treewidth-(fc — 3) with distortion less 
than D(logn). 

We also extend a result of Alon et al. [1] by showing that for any k, 
planar graphs cannot be embedded into distributions over treewidth-fc 
graphs with better than 17(logn) distortion. 



1 Introduction 

Many difficult problems can be approximated well when restricted to certain 
classes of metrics [13,3,4]. Therefore, low distortion embeddings into these re- 
stricted classes of metrics become very desirable. We will refer to the original 
metric which we would like to embed as the source metric and the metric into 
which we would like to embed as the target metric. 

Tree metrics form one such class of desirable target metrics in the sense 
that many difficult problems become tractable when restricted to tree metrics. 
However, they are not sufficiently expressive; i.e. it has been shown that there 
are classes of metrics which cannot be approximated well by trees. In particular, 
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Rabinovich and Raz [15] have proved that graph metrics cannot be embedded 
into trees with distortion better than girth/3 — 1. 

Therefore, subsequent approaches have proposed the use of more expressive 
classes of metrics. Alon et al. [1] showed that any n-point metric can be embed- 
ded with *°g") distortion into distributions over spanning trees. In 

so doing they demonstrated that such probabilistic metrics are more expressive 
than tree metrics. Bartal [2] formally defined probabilistic embeddings and pro- 
posed using distributions over arbitrary dominating tree metrics. He showed that 
any finite metric could be embedded into distributions over trees with O(log^n) 
distortion. He subsequently improved this bound to O(lognloglogn) [3]. 

This path culminated recently in the result of Fakcharoenphol et al. [7] in 
which they improved this bound to O(logn) distortion. This upper bound is 
known to be tight since there exist graphs which cannot be embedded into such 
distributions with better than 12(logn) distortion. This lower bound follows nat- 
urally from the fact that any distribution over trees can be embedded into i\ 
with constant distortion, and the existence of expander graphs which cannot be 
embedded into ii with distortion better than l7(logn). Surprisingly, Gupta et 
al. [8] showed that this same bound is in fact achieved by source graph metrics of 
treewidth-2. Their result is also implicit in the work of Imase and Waxman [10] 
where they establish a lower bound on the competitive ratio of online Steiner 
trees. 

It is plausible to hypothesize that there are more general and expressive 
classes of target metrics. We explore using distributions over minor closed fami- 
lies of graphs and show that asymptotically, they are no stronger than distribu- 
tions over trees. We show a l7(logn) lower bound on the distortion even when 
the source metrics are graphs of low tree width. More precisely, for any minor M, 
where jMj = k, we exhibit a construction for an infinite class of finite metrics 
of treewidth-(fc -|- 1) for which any embedding into distributions over families of 
graphs excluding M achieves I2(log n) distortion. A corollary of this fact is that 
treewidth-A: graphs cannot be embedded into distributions over treewidth-(fc — 3) 
graphs with distortion less than l7(logn). 

A weaker result can be inferred directly from Rao [16] who proved that 
any minor closed family can be embedded into £\ with distortion O(yTogTi). 
Consequently, the expanders exhibited by Linial et al. [13] cannot be embedded 
into distributions over minor closed families with distortion less than l7(-^log n). 

One can derive a lower bound of f?(log n) by combining the method of Klein 
et al. [11] for decomposing minor excluded graphs with Hartal’s [2] proof of the 
lower bound for probabilistic embeddings into trees. This bound also follows 
from the recent paper of Rabinovich on the average distortion of embeddings 
into ii. However, we show this same bound holds for embeddings of simple, 
low-treewidth source metrics. 

We continue by exploring the difficulty of embedding planar graph metrics 
into distributions over other minor excluded graph families. Alon et al. showed 
that any embedding of a 2-dimensional grid into a distribution over spanning 
subtrees must have distortion l7(logn). We show that for any fixed k, the same 
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lower bound holds for embeddings of 2-dimensional grids into distributions over 
dominating treewidth-fc graph metrics. 

Note that our l7(logn) lower bounds hide polynomial factors of k. 



1.1 Techniques 

We employ Yao’s MiniMax principle to prove both lower bounds - it suffices 
to show that for some distribution over the edges of the source graph, any em- 
bedding of the source graph into a dominating target graph has large expected 
distortion. In both cases the expected distortion is shown to be logarithmic in 
the number of vertices in the graph. 

For the main result, we show that given any minor M, one can recursively 
build a family of source graphs which guarantee a large expected distortion when 
embedded into a graph which excludes M as a minor. This structure is detailed 
in section 3. While we use some elements from the construction of Gupta et 
al. [8] , the main building block in our recursive construction is a stretched clique 
rather than the simpler treewidth-2 graphs used by Gupta et al., and hence, our 
proof is more involved. As a step in our proof, we show that if graph G does not 
contain H as a, minor, then subdivisions of H can not be embedded into G with 
a small distortion. 

To show the lower bound for embedding planar graphs into bounded tree- 
width graphs, we use the notion of nice tree decompositions [5,12] of bounded 
tree- width graphs. We show that for a range of set sizes, the subsets of a 2- 
dimensional grid have much larger separators than do their embeddings in the 
nice tree decompositions; this suffices to prove an f2(logn) lower bound with a 
little more work. This argument is presented in section 4. 



2 Definitions and Preliminaries 



Given two metric spaces (G, z/) and (i^, /r) and an embedding : G — >■ Ft, we 
say that the distortion of the embedding is ||^|| • ||^~^|| where 



\\n 

ii^-i 



K ^{x),<P{y)) 

v{x,y) 
v{x,y) 
x,y^G y{^{x),^{y)) 



max 

x,y^G 



max 



and (G, v) a- approximates {H, fi) if the distortion is no more than a. We say 
that p dominates v if p{<P(x),<P{y)) > v{x,y) 'ix,y. 



Definition 1. Given a graph G = (Vg,Eg), a tree T = (Vt,Et) and a collec- 
tion {Xi\i G Vt} of subsets o/Vg, then {{Xi},T) is said to be a tree decompo- 
sition of G if 

1 . U 

Yt 
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2. for each edge e € Eg, there exists i € Vr such that the endpoints of e are in 
Xi, and 

3. for all i,j, k £ Vr ■ if j lies on the path from i to k in T , then Xi n X^ C Xj . 

The width of a tree decomposition is max\Xi \ — 1. The treewidth of a graph G 

iGVr 

is the minimum width over all tree decompositions of G. 



Definition 2. A tree decomposition {{Xi},T = {Vt,Et)) of a graph G = 
(Vg,Eg) is said to be a nice decomposition if: 

1. T is a rooted binary tree 

2. if node i G Vr has two children j\,j 2 , then Xj^ = Xj^ = Xi 

3. if node i G Vr has one child j, then either Xi C Xi and \Xi\ = IX, I — 1 or 
Xj C X, and |Xj| = |X,| - 1 

4-. if node i G Vr is a leaf then \Xi \ = 1 



Proposition 1. [5,12] If graph G has a tree decomposition of width k, then G 
has a nice tree decomposition of width k. 

The following property follows directly from the definition of a tree decom- 
position: 

Proposition 2. For all i, j, k G Vr '■ if j lies on the path from i to k in T, and 
x\ G Xi and X 2 G Xk then any path connecting x\ and X 2 in G must contain a 
node in Xj. 

For a more complete exposition of treewidth see [12,5]. 

Definition 3. Given a graph H , a fc-subdivision of H is defined as the graph 
resulting from the replacement of each edge by a path of length k. 

We will use Gn,m to denote the m-subdi vision of We observe that 
treewidth{Gn,m) = n — 1 = treewidth{Kn) . 

3 Distributions over Minor Exclnded Families 

3.1 Outline 

Given a family F which excludes minor M where |M| = k, we recursively con- 
struct a family of graphs Hi which embed into any graph in F with average 
distortion 12(logn) where n = \Hi\. Then by Yao’s MiniMax principle, any em- 
bedding into a distribution over F must have distortion l7(logn) as claimed. 



150 



D.E. Carroll and A. Goel 



3.2 Results 

Before proceeding with our main result, we need the following technical lemma^. 

Lemma 1. Let G,H be graphs sueh that G does not eontain H as a minor and 
let J be the k-subdivision of H. If f is an embedding of the metric of J into the 
metric of G, then f has distortion at least fc/6 — 1/2. 

Proof. We will use the linear extension framework of Rabinovich and Raz [15] . 
While we borrow some of their techniques, we have not found a reduction that 
would use their lower bounds (on the distortion of an embedding of a graph 
into another with a smaller Euler characteristic) as a black-box to prove this 
technical lemma. 

The linear extension Q of a graph Q is obtained by identifying with each 
edge of Q a line of length 1. All the points on all the lines belong to Q. The 
metric dg can be extended to the metric dg in the following natural fashion. If 
X and y lie on the same line {u,v) then dg{x,y) is merely their distance on the 
line. If X lies on the line (u,v) and y lies on a different line {w,r), then 

dg{x, y) = min{ dg{x, u) + dg{u, w) + dg{w, y), 
dg{x,v) +dg{v,w) +dg{w,y), 
dg{x,u) + dg{u,r) + dg{r, y), 
dg{x,v) +dg{v,r) +dg{r,y)} 

We will refer to the original vertices as vertices of Q and to the original edges 
of Q as the lines of Q. We can now extend the embedding / into a continuous 
map f : J ^ G such that 

1. / agrees with / on vertices of J, and 

2. if a; € (w, v) where (m, v) is a line of J then f{x) lies on a 
shortest path from f{u) to f{v) in G such that dj{x,u) / dj{x,v) = 
dG{f{x),f{u))/dG{f{,x),hv)). 

Since / is continuous, the entire line (m, v) in J must be mapped to a single 
shortest path from f{u) to f{v) in G. We will now assume that the distortion 
a of / is less than fc/6 — 1/2 and derive a contradiction. But first, we need to 
state and prove the following useful claim: 

Claim. If the points f{x) and f{y) lie on the same line in G, the points x,x' 
lie on the same line in J, and the points y, y' lie on the same line in J, then 
dj{x',y') <2a + l. 

Proof. Suppose x, x' and y, y' lie on lines (p, q) and (r, s) in J, respectively. 
Use X to denote the quantity dG{f{x),f{y)) and Y to denote the quantity 

^ This lemma appears to be folklore. But we have not been able to find a proof in the 
literature. Our proof is non-trivial and might be useful in other contexts, so we have 
provided the proof in this paper. 
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doifip), f{x))+dG{Hx)J{q)) + dG{f{r)J{y)) + dG{fiy)Jis))- Siiwe /(x) and 
f{y) lie on the shortest paths from f{p) to f{q) and from f{r) to /(s), respec- 
tively, we have dGifip), f(q)) + dGif(r), f(s)) = Y. Also, by triangle inequal- 
ity, we have dG{f{p)J{r)) + dG{f{p)Jis)) + dG{f{q)J{r)) + dG{f{q)J{s)) < 
4X + 2Y. Clearly, X < 1 and y > 2, so we have 

dG{f{p)J{r)) + dG{f{p)J{s)) + dG(/(g), f{r)) + /(s)) ^ ^ 

dGifip) J{q)) + dG{f{r)J{s)) 

Since the distortion is a, we must have 

djjp, r) + djjp, s) + djjq, r) + djjq, s) ^ 
djip,q) + djir,s) 

__ But dj(p, q) = Jj(r, s) = 1. Also, dj(p, r) + djip, s) + djiq, r) + djiq, s) > 
Adjixfy') — 4. Hence, we have (4dj(x',t/') — 4)/2 < 4a, or djix' ,y') < 2a -I- 1. 

We will now continue with the proof of Lemma 1. For any edge iu,v) € H, 
consider the u-v path of length k in J. Consider the image of this path in G 
under /. The image need not be a simple path, but must contain a simple path 
from /(u) to fiv). We choose any such simple path arbitrarily, and call it the 
representative path for (u, v), and denote it by Pin, v). Start traversing this path 
from /(u) tofiv) and stop at the first vertex q which is the image of a point x 
on the u-v path in J such that dj(u,x) > k/2 — a — 1/2. Let ip,q) be the last 
edge traversed. We will call this the representative line of (u, v) and denote it by 
Liu,v). Consider a point y on the u-v path in J that maps to p. The choice of 
p and q implies that dj{u, y) < k/2 — a— 1/2, and hence, we can apply claim 1 
to conclude that k/2 — a — 1/2 < djiu,x) < k/2 + a-\-l/2. Now, for any point 
z not on the u-v path in J, we have dj(x, z) > k/2 — a— 1/2. Since we assumed 
a < fc/6 — 1/2, we have dj(x, z) > 2a -I- 1 and hence, /(z) can not lie on the line 
Liu, v). 

Thus the representative line Liu, v) has two nice properties: it lies on a simple 
path from /(u) to fiv) and does not contain the image of any point not on the 
u-v path in J. 

Now, we perform the following simple process in the graph G: For every 
edge (m, v) of H, contract all the edges on the representative path P(u, v) in G 
except the representative edge L(it, x). Since the representative line of an edge 
iu,v) of H does not intersect the representative path of any other edge of H, 
no representative edge gets contracted. The resulting graph is a minor of G, but 
also contains H as a, minor, which is a contradiction. Hence a > k/6 — 1/2. 

Now we can prove our main result: 

Theorem 1. Let F be a minor closed family of graphs which excludes minor 
M where \Vm\ = k. There exists an infinite family of graphs Hi with treewidth- 
(fc-|-l) such that any a— approximation of the metric of Hi by a distribution over 
dominating graph metrics in F has a = 12(log 
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Proof. Assume k is odd. We proceed by constructing an infinite sequence Hi of 
graphs of treewidth-k and show that the minimum distortion with which they 
can be embedded into a distribution over graph metrics in F grows without 
bound as i increases. 

First we construct H\: Construct the graph Hi by taking Kk and attaching 
each point v to source s with {k — l)/2 disjoint paths of length k and attaching 
V to sink t with {k — l)/2 paths of length k. Call s and t the terminals of Hi. 

We now show that Hi has m = {2k + 1)(2) edges and contains ( 2 ) edge- 
disjoint paths of length 2fc-|-l. Label the s to Vi paths spt^i to spi^(^k-i )/2 and 
the Vi to t paths tpi^i to tpi,(fc-i)/ 2 - Observe that the paths formed as follows 
are edge disjoint: 



path sp,j followed by edge {vi,V(^^+ 2 j)modk) followed by path tp(^i+ 2 j)modk,j 



Note that Hi has treewidth < fc -I- 1. 

The Hi are constructed recursively: For each i, construct the graph Hi by 
replacing every edge in Hi with a copy of Therefore, Hi has {2k + 1)*(2) 

edges and each pair of non-terminal vertices is connected by ( 2 )* ^ edge-disjoint 
paths of length {2k + 1)®“^. Also note that Hi has treewidth < fc -|- 1. 

As in [8] we use Yao’s MiniMax Principle to prove the lower bound. We show 
that there is a distribution d over the edges of Hi such that for any embedding of 
Hi into a graph which excludes minor M, an edge chosen randomly from distri- 
bution d has an expected distortion of Q{i). Then by Yao’s MiniMax principle, 
for any embedding of Hi into a distribution over graphs in F, there must be an 
edge with expected distortion Q{i). We shall assume a uniform distribution over 
the edges. 

Let U he & graph which excludes minor M and let ^ : iLi — ?► C/ be an 
embedding of Hi into U such that distances in U dominate their corresponding 
distances in Hi. For each edge e G Hi we will give e the color j,l < j < i — 1 
if <P distorts e by at least g(2A: -|- I)-’ — 1/2. Note that an edge may have many 
colors. 

Consider the copies of ( 2 fc-i-i)*-i in Hi. Hi contains a copy of in which 

2 1 

every edge has been replaced with a copy of Each copy of iLi_i has ( 2 ) 

2 1 

edge disjoint paths of length (2fc-|-l)®“^. Thus, Hi clearly contains at least ( 2 ) 
edge disjoint copies of G/j ( 2 fe-i-i)*-i • 

Clearly, each copy of G^ ( 2 fe+i)i-i contains Kk and therefore M as, & minor 
since \M\ = k. Thus, by Lemma 1, at least one edge in each copy of Gfc_( 2 fc+i)i-i 
has distortion > \{2k + 1)®“^ — 1/2. Since Hi comprises m copies of Hi-i, it 

contains ^ copies of Gk^( 2 k+iy-^- Therefore, there are at least rn{^Y 

edges with color i — 2. In general, there are at least > • ( 2 )^ edges with 

color j. 

The distortion of an edge is > |(2fc -I- 1)-1 — 1/2 where j is the largest of the 
edge’s colors. For each edge e € E//. let Gg be the set of colors which apply to 
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edge e. Clearly, Ve € Eh^, 

Y^{\{2k+iy - 1/2) < 2.max(i(2fc+ 1)^- - 1/2) 



j^Ce 



Thus, 



^ ^ ^ X!^(2fc + 1) 



eG-Eu, 



2 ^ ^ 7' 



> 



-^|{e|jGCe}|-(2fc + l)^ 



14^ 

i=i 

1 /I \ 3 

._,_i /fc' 



14^ 



i=i 

i-l 



(2/c + 1)^' 






14 

1=1 

= 1) • 



> 

> 



28m 

1 

28^' 



z • m 



Then, since the Hi has m* edges, the average distortion is: 5 ^* = 
i7(log|iJ,|). 



4 Planar Graphs 

4.1 Outline 

First we show that given a 2-dimensional grid, there is a distribution over the 
edges such that any embedding into a treewidth-A: graph metric has high ex- 
pected distortion. The proof builds on the work of Alon et al. [1]. By Yao’s Min- 
iMax principle, this is enough to show that the 2-dimensional grid can not be 
embedded into a distribution over such graphs with distortion less than L?(log n). 

4.2 Results 

In this section we will use GRIDn to denote the planar graph consisting of the 
n X n grid. 

Lemma 2. (From [1]) If A is a set of 0^ vertices in GRIDn, where 0^ < ^0, 
then there are at least (3 rows that A intersects but does not fill or at least (3 
columns that A intersects but does not fill. 
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Lemma 3. (Modified from [1]) If A is a set of 0^ vertices in GRIDn, where 
0^ < 0^ , and B is a set of at most /3/4 vertices in A, then there are at least /3/2 
vertices in A that have neighbors outside A and have distance at least from 
each vertex of B. 

Proof. By Lemma 2, there is a set C of at least (3 vertices in A which are in 
distinct rows or distinct columns and have neighbors outside of A. Since they 
are in distinct rows, a vertex of B can be at distance < of at most 

vertices in C. Thus, there are at least | vertices at distance at least from 
each vertex in B. 

Lemma 4. Let H be a graph of treewidth k, <P : GRIDn H be an embedding 
of GRIDn into H , and (3 < n/4. Then there are at least edges {u,v) such 
that dH{u,v) > 

Proof. Since H has treewidth-fc, it must have a tree decomposition of width k. 
Moreover, it must have a nice tree decomposition (Xi,T) of width k by Propo- 
sition 1. 

Given a subtree S of T, define the HSize(S') = | IJ Xi\. Since (Xi,T) is a 

i^Vs 

nice decomposition, we know that T has a maximum degree of 3. We also know 
that for any two adjacent vertices i,j € T, \Xi — Xj\ < 1. Thus, every subtree S 
of T has an edge e such that removing e creates 2 subtrees each with HSize at 
least 1/3 • HSize(S'). Note that if S'! and S 2 are the two subtrees of T, IJ Xi 

ievsi 

and U Xi are not disjoint. 
ieCsa 

Start with T and successively delete edges from the remaining component 
with the largest HSize such that the HSizes of the resulting subtrees are as 
evenly divided as possible. Do this until \^^ — 1 edges have been deleted 

and there are pieces. The smallest piece will always be at least 1/3 the 

HSize of the previous largest piece. Therefore, on the average these pieces have 
HSize = 3f3^ and the smallest will have HSize > /3^. 

Since each deleted edge of T is incident to 2 pieces, the average number of 
pieces incident with a piece is less than 2. Thus, at least half the pieces are 
incident with no more than 4 edges. 

Each deleted edge in T represents a set of points which form a vertex cut of 
H of size < k+1. Thus, there are pieces of HSize > /3 which are separated 
from the rest of H by a cut of size < 4(fc -|- 1). Let A be a piece of HSize > 
and let B be the subset of size < 4(fc-|- 1) separating A from the rest of H. Then 
by Lemma 3, A has at least /3/2 vertices with neighbors outside of the piece 
whose distance from the vertices of B is at least Thus, there are at least 

2 o 

^ edges which are each distorted by a factor of Ygp/pyy- 

Theorem 2. Any a— approximation of the metric of GRIDn by a distribution 
over dominating treewidth k graphs has a = 12(logn). 
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Proof. Let H be an arbitrary graph of treewidth k whose metric dominates 

2 

that of GRIDn- By Lemma 4, there are at least edges which are distorted 
by > /3/16(fc + 1) for any P < j. Let X be the distortion of an edge chosen 
uniformly at random from GRIDn- X can only take on non-negative integer 

values, so the expected distortion is E[X] = Y^Prob{X > i). For i < 

2>1 

let P = l(S{k + 1)L Then, 



E[X] = 



> 



> 



''^^Prob{X > i) 



[n/64(fc-|-l)J 2 

E n 

24 • 16(fc -b l)i • 2n(n - 1) 

Ln/64(fc-K)J ^ 

^ 2- 24- 16(/fc-bl)i 



L«./64(fc+l)J 

= i V 1 

768(A:-bl) ^ i 

2>1 

= l7(logn) 



Since H was arbitrarily chosen, then by Yao’s MiniMax principle if GRID^ 
is embedded into a distribution over treewidth-fc graphs, there must be an edge 
with expected distortion of l7(logn). 



5 Conclusions 

It is interesting to note that the inability of minor closed families to approximate 
all graphs well supports the conjecture [8] [9] that minor closed families of graphs 
(and therefore distributions over such families) can be embedded into with 
constant distortion. Since Linial et al. [13] showed a lower bound of 12(log n) for 
embedding arbitrary graph metrics into i\, the conjecture further implies that 
there must be families of graphs which cannot be embedded into distributions 
over excluded minor graphs with distortion less than 17 (log n). 

However, the particular inability of minor closed families to approximate 
other minor closed families also eliminates one potential approach to embed- 
ding these families into Gupta et al. [8] showed that although treewidth-2 
graphs and treewidth-1 graphs are both embeddable into ii with constant distor- 
tion, treewidth-2 graphs are not embeddable into distributions over treewidth-1 
graphs with constant distortion. We have shown that a similar structure holds 
for all higher treewidths. Thus, an approach which attempts to repeatedly embed 
bounded treewidth graphs into (distributions over) graphs with lower treewidth 
will not work. 



156 



D.E. Carroll and A. Goel 



Acknowledgements. The authors would like to thank an anonymous reviewer 
for pointing out the aforementioned folklore result. We would also like to thank 
Elias Koutsoupias for valuable discussions, and Adam Meyerson and Shailesh 
Vaya for helpful comments on previous drafts. 



References 

1. N. Alon, R. Karp, D. Peleg, and D. West, “A Graph-Theoretic Game and its 
Application to the k-Server Problem”, SIAM J. Comput., 24:1 (1998), pp. 78-100. 

2. Y. Bartal, “Probabilistic approximation of metric spaces and its algorithmic ap- 
plications”, In Proceedings of the 37th Annual IEEE Symposium on Foundations 
of Computer Science, 1996, pp. 184-193. 

3. Y. Bartal, “On approximating arbitrary metrics by tree metrics”. In Proceedings 
of the 30th Annual ACM Symposium on Theory of Computing, 1998, pp. 161-168. 

4. B. Baker, “Approximation Algorithms for NP-complete Problems on Planar 
Graphs”, J. ACM, 41 (1994), pp. 153-180. 

5. H. Bodlaender, “A partial k-arboretum of graphs with bounded treewidth”. The- 
oretical Computer Science, 209 (1998), pp. 1-45. 

6. R. Diestel, “Graph Theory”, Springer- Verlag, New York, 1997. 

7. J. Fakcharoenphol, S. Rao, and K. Talwar, “A Tight Bound on Approximating 
Arbitrary Metrics by Tree Metrics” , In Proceedings of the 35th Annual ACM Sym- 
posium on Theory of Computing, 2003, pp. 448-455. 

8. A. Gupta, I. Newman, Y. Rabinovich, and A. Sinclair, “Guts, trees and £i- 
embeddings.” , In Proceedings of the fOth Annual IEEE Symposium on Foundations 
of Computer Science, 1999, pp. 399-408. 

9. C. Ghekuri, A. Gupta, I. Newman, Y. Rabinovich, and A. Sinclair, “Embedding 
fc-Outerplanar Graphs into £T’ , In Proceedings of the Ifth Annual ACM-SIAM 
Symposium on Discrete Algorithms, 2003, pp. 527-536. 

10. M. Imase and B. Waxman, “Dynamic Steiner Tree Problem.”, SIAM J. Discrete 
Math., 4 (1991), pp. 369-384. 

11. P. Klein, S. Plotkin, and S. Rao, “Excluded Minors, Network Decomposition, and 
Multicommodity Flow”, In Proceedings of the 25th Annual ACM Symposium on 
Theory of Computing, 1993, pp. 682-690. 

12. T. Kloks, “Treewidth: Gomputations and Approximations”, Lecture Notes in Com- 
puter Science, Vol. 842, Springer- Verlag, Berlin, 1994. 

13. N. Linial, E. London, and Y. Rabinovich, “The geometry of graphs and some of 
its algorithmic applications”, Combinatorica, 15 (1995), pp. 215-245. 

14. Y. Rabinovich, “On Average Distortion of Embedding Metrics into the Line and 
into ii” , In Proceedings of the 35th Annual ACM Symposium on Theory of Com- 
puting, 2003, pp. 456-462. 

15. Y. Rabinovich and R. Raz, “Lower Bounds on the Distortion of Embedding Finite 
Metric Spaces in Graphs”, Discrete & Computational Geometry, 19 (1998), pp. 
79-94. 

16. S. Rao, “Small distortion and volume preserving embeddings for Planar and Eu- 
clidean metrics”. In Proceedings of the 15th Annual Symposium on Computational 
Geometry, 1999, pp. 300-306. 



A Parameterized Algorithm for Upward 
Planarity Testing 
(Extended Abstract) 



Hubert Chan* 

School of Computer Science 
University of Waterloo, Canada 
hySchanSuwaterloo . ca 



Abstract. Upward planarity testing, or checking whether a directed 
graph has a drawing in which no edges cross and all edges point npward, 
is NP-complete. All of the algorithms for upward planarity testing de- 
veloped previously focused on special classes of graphs. In this paper we 
develop a parameterized algorithm for upward planarity testing that can 
be applied to all graphs and runs in 0(/(fc)n® + g{k,£)n) time, where n 
is the number of vertices, k is the number of triconnected components, 
and £ is the number of cutvertices. The functions f(k) and g{k,£) are 
defined as f{k) = and g{k,£) = 2®'^ k^'^ k\8^ . Thus if the num- 
ber of triconnected components and the number of cutvertices are small, 
the problem can be solved relatively quickly, even for a large number of 
vertices. This is the first parameterized algorithm for upward planarity 
testing. 



1 Introduction 

The area of graph drawing deals with geometric representations of abstract 
graphs, and has applications in many different areas such as software architec- 
ture, database design, project management, electronic circuits, and genealogy. 
These geometrical representations, known as graph drawings, represent each ver- 
tex as a point on the plane, and each edge as a curve connecting its two endpoints. 
Broader treatments of graph drawing are given by Di Battista et al. [6] and by 
Kaufmann and Wagner [21]. In our discussion of algorithmic results for graphs, 
we will use the number of vertices, n, as the input size. 

Upward planarity testing, or determining whether or not a directed graph can 
be drawn with no edge crossings such that all edges are drawn upward, is NP- 
complete [17]. An example of an upward planar drawing is shown in Figure 1. 
In this paper, we will show how to apply parameterized complexity techniques 
to solve upward planarity testing. We show how to test upward planarity in 

n) time, where k is the number of triconnected compo- 
nents in the graph and £ is the number of cutvertices. If the graph is biconnected, 
we give a 0{kl8^n^) time algorithm. Thus if k is small, we can do upward pla- 
narity testing efficiently. 
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Fig. 1. An example of an upward planar drawing 



A common approach to solve the problem of upward planarity testing is to 
look at special classes of graphs. Polynomial-time algorithms have been given for 
st-graphs [9], bipartite graphs [8], triconnected graphs [2], outerplanar graphs 
[22], and single-source graphs [3,19]. Di Battista and Liotta [7] give a linear time 
algorithm that checks whether a given drawing is upward planar. Combinatorial 
characterizations for upward planarity are given by Tamassia and Tollis [25] , Di 
Battista and Tamassia [9], Thomassen [27], and Hutton and Lubiw [19]. 

For some applications, upward planarity requirements may not be sufficient, 
or may be too strict. Jiinger et al. [20] investigate proper layered drawings, 
Bertolazzi et al. [1] introduce quasi-upward planarity, and Di Battista et al. [10] 
investigate the area requirements for upward drawings. 

Parameterized complexity, introduced by Downey and Fellows [12] is a tech- 
nique to develop algorithms that are polynomial in the size of the input, but pos- 
sibly exponential in one or more parameters, and hence efficient for small fixed 
parameter values. Although this is the first application of parameterized com- 
plexity to upward planarity testing, it has been applied to solve some other prob- 
lems in graph drawing. Peng [23] investigates applying the concept of treewidth 
and pathwidth to graph drawing. Parameterized algorithms have been devel- 
oped for layered graph drawings [13,14], three-dimensional drawings of graphs 
of bounded path-width [16], the one-sided crossing maximization problem [15], 
and the two-layer planarization problem [24]. 

This paper is organized as follows: in Section 2, we define terms that we use 
in this paper. In Section 3, we develop a parameterized algorithm for upward 
planarity testing in biconnected graphs. We will then show how to join together 
biconnected components: given two components Gi and G 2 to joined at vertices 
v\ € V(Gi) and V 2 € V(G 2 ), we draw one component within a face of the other, 
draw an edge from vi to V 2 , and contract the edge. Thus we must investigate the 
effects of edge contraction in upward planar graphs (Section 4). We also define 
a notion of node accessibility in Section 5, which will determine when the edge 
(vi,V 2 ) can be drawn. In Section 6, we use the results from the previous sections 
to join together biconnected components, giving us a parameterized algorithm 
for upward planarity testing in general graphs. 
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2 Definitions 

In this paper, we assume the reader is familiar with the standard definitions of 
graphs and directed graphs, which can be found in a basic graph theory book 
[ 4 , 11 ]. Given a graph G, we denote by V{G) and E{G) the sets of its vertices 
and edges, respectively. If there is no confusion as to which graphs we refer, 
we may simply write V and E. Unless otherwise specified, all graphs in this 
paper are directed simple graphs. In this paper, we adopt the convention of 
using lowercase Roman letters to represent vertices and lowercase Greek letters 
to represent edges. 

A graph is biconnected (triconnected) if between any two vertices, there are 
at least two (three) vertex-disjoint paths. A cutvertex is a vertex whose removal 
disconnects the graph. 

Given vertex u in a directed graph, its indegree {outdegree), denoted deg~(u) 
(deg"'’(u)), is the number of incoming (outgoing) edges. A vertex with no incom- 
ing (outgoing) edges is called a source {sink). 

A drawing Lp oi & graph G is a mapping of vertices to points on the plane, 
and edges to curves such that the endpoints of the curve are located at the 
vertices incident to the edge. In the case of directed graphs, each curve has an 
associated direction corresponding to the direction of the edge. We say that a 
curve is monotone if every horizontal line intersects the curve at most once. A 
graph is planar if it has a drawing in which no edges cross. A directed graph is 
upward planar if it has a planar drawing in which all edges are monotone and 
directed upwards. 

A planar drawing partitions the plane into regions called faces. The infinite, 
or unbounded, face is called the outer face. 

A drawing p defines, for each vertex v, a clockwise ordering of the edges 
around v. The collection E of the clockwise orderings of the edges for each 
vertex is called a (planar) embedding. If edge £2 comes immediately after e\ in 
the clockwise ordering around v, then we say that ei and £2 are edge-ordering 
neighbours, £2 is the clockwise neighbour of £1 around v, and £1 is the counter- 
clockwise neighbour of £2. 

In an upward planar drawing, all incoming edges to v must be consecutive, as 
must all outgoing edges [ 25 ] . Thus for each vertex, we can partition the clockwise 
ordering of its edges into two linear lists of outgoing and incoming edges. We call 
the collection of these lists an upward planar embedding. If u is a source (sink), 
then the first and last edges in the list of outgoing (incoming) edges specify a 
face. Using the terminology of Bertolazzi et al. [ 2 ], we say that is a big angle 
on that face. 

If a vertex v has incoming edge £1 and outgoing edge £2 that are edge ordering 
neighbours, then we say that the face that contains £1 and £2 as part of its 
boundary is flattenable at v. 

We contract an edge e = {v, w) by removing v and w, along with their 
incident edges. We then add a new vertex and for each edge {u,v) or {u,w) 
in E{G), we add the edge {u,vf). For each edge {v,u) or {w,u), we add the 
edge (ve,u). The resulting graph is denoted G/e. Given an embedding E of G, 
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we construct an embedding of G/e, denoted G/e, as follows: for each vertex 
u ^ r’jW, the clockwise ordering of edges around u remains the same. We then 
construct the clockwise ordering around by first adding the edges incident 
to V in order, starting with the clockwise neighbour of e around v and ending 
with the counter-clockwise neighbour of e, then adding the edges incident to w 
in order, starting with the clockwise neighbour of e around w and ending with 
the counter-clockwise neighbour of e. We note that F/e may not be an upward 
planar embedding, even if F was; we investigate this further in Section 4. 

3 Biconnected Components 

We first consider only biconnected components; in Section 6, we show how to 
connect these components together. Our goal for this section is to bound the 
number of possible planar embeddings of a biconnected graph by a function 
/(fc), where k is the number of triconnected components in the graph, which 
will allow us to obtain a parameterized algorithm for upward planarity testing. 

Theorem 1. Given a planar biconnected graph G that has k triconnected com- 
ponents, G has at most kl8^~^ possible planar embeddings, up to reversal of all 
the edge orderings. 

Proof. ( outline ) We first show that given two vertices in an embedded tricon- 
nected graph, there are at most two faces that contain both vertices. From this, 
we can show that given two triconnected components G\ and G 2 of G that share 
a common vertex v, there are at most eight possible embeddings of Gi U G 2 . 
The eight embeddings come from the fact that there are at most two faces of 
G\ in which G 2 can be drawn (which are the two faces that contain both v and 
another vertex that lies on a path from G\ to G 2 ), and vice versa, and that 
there are two possibilities for the edge orderings of G 2 with respect to G\. From 
this, we can show that if G has k triconnected components G\, . . . ,Gk that all 
share a common vertex, there are at most (fc — possible embeddings of 

Gi U • • • U Gfc. Since there are at most k shared vertices between triconnected 
components, we have at most k{k — = /c!8^“^ embeddings. □ 

Using this bound, we can produce a parameterized algorithm that tests 
whether G is upward planar. 

Theorem 2. There is an 0{kl8^n^)-time algorithm to test whether a bicon- 
nected graph is upward planar, where n is the number of vertices and k is the 
number of triconnected components. 

Proof. Our algorithm works as follows: first it divides the input graph G into 
triconnected components, which can be done in quadratic time [18]. It then tests 
each possible embedding of G for upward planarity. By Theorem 1, we have 
kl8^~^ embeddings. From Euler’s formula, we know that for each embedding, 
there are at most n possible outer faces. Bertolazzi et al. [2] give a quadratic-time 
algorithm to determine whether a given embedding and outer face correspond 
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to an upward planar drawing. Thus we can run the algorithm for each possible 
embedding and outer face, giving a time complexity of 0{kl8'^n^). □ 

Our bound on the number of possible embeddings is, in many cases, much 
larger than the actual number of possible embeddings. For example, if each 
vertex is common to at most two triconnected components, we have only 8 ^“^ 
possible embeddings, rather than We also note that Theorem 1 does not 

depend on upward planarity. Therefore a similar technique could be applied to 
other graph drawing problems in which we have an algorithm that solves the 
problem given a specific embedding. 



4 Edge Contraction 

We now investigate how we can join upward planar embeddings of biconnected 
graphs in order to obtain an upward planar embedding of a general graph. The 
joining is achieved by first connecting the biconnected graphs with an edge, and 
then contracting the edge. Thus we will first examine the effect of edge contrac- 
tion on upward embeddings. Notably, we will determine conditions under which 
contraction of an edge in an upward planar embedding produces an embedding 
that is still upward planar. 

Throughout this section, we let G be an upward planar graph with upward 
planar embedding F, and let e = (s,t) be the edge that we wish to contract. 
We also assume that both s and t have degree greater than one; if not, it is 
easy to see that contracting e will result in an upward planar embedding. Thus 
we will consider the edge-ordering neighbours of e. Throughout this section, we 
will use the following labels. Let a = (a,t) and (3 = (b,t) be the clockwise and 
counterclockwise neighbours, respectively, of e around t in the embedding F, and 
let 7 = (c, s) and <5 = (d, s) be the counterclockwise and clockwise neighbours of 
e around s (Figure 2 a). 

Since e is an arbitrary edge, we must consider the orientations of a, (3, 7 , 
and 5. As shorthand, if a or /3 is oriented towards f, we say that it is oriented 




Fig. 2. The vertices around the edge e, and the edge orientations that we consider. 
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inward, and similarly for when 7 or <5 is oriented towards s. If a or /3 is oriented 
away from t, we say that it is oriented outward, and similarly for 7 and S. 

In this paper, we will only consider the four cases for the orientations of a, (3, 
7 , and S that will be used in Section 6 , and we only give the lemma statements. 
The remaining cases, as well as the complete proofs, are given in the thesis from 
which this work is derived [5]. 

Case 1: a and j3 oriented outward, 7 and S oriented arbitrarily (Figure 2b), 

Case 2: 7 and 6 oriented inward, a and j3 oriented arbitrarily (Figure 2c), 

Case 3: a and 7 oriented outward, fd and S oriented inward (Figure 2d), and 

Case 4: a and 7 oriented inward, f3 and <5 oriented outward (Figure 2e) 

In all four cases, we show that G/e is upward planar with embedding F/e. 
The proof of Lemma 1, which proves Cases 1 and 2, is a straightforward extension 
of a lemma by Hutton and Lubiw [19], and Lemma 2, which proves Cases 3 and 
4, can be easily shown using the characterization given by Bertolazzi et al. [2]. 
We omit both proofs. 

Lemma 1. If deg“(t) = 1 (deg"'’(s) = I), then Gje is upward planar with 
upward planar embedding F/e. □ 



Lemma 2. If the edges a and 7 are oriented outward (inward), and (3 and 6 
are oriented inward (outward), then G/e is upward planar with upward planar 
embedding F/ e. □ 



5 Node Accessibility 

We now define a notion of accessibility of vertices from different parts of the 
outer face, namely the area above or below the drawing of G. This, along with 
the edge contraction results from the previous section, will help us join together 
upward planar subgraphs. 

Given an upward planar graph G with a specified upward planar embedding 
F and outer face F, we say that the vertex v is accessible from above (below) if 
there is an upward planar drawing of G corresponding to the specified embedding 
and outer face such that a monotone curve that does not cross any edges can be 
drawn from r; to a point above (below) the drawing of G. 

Note that if a monotone curve can be drawn from to a point p above the 
drawing of G, we can draw a monotone curve from v to any other point q above 
the drawing of G by appropriately modifying the curve from v to p. Therefore 
our definition of accessibility from above is not dependent on the point to which 
we draw the curve. 

In order to efficiently test whether or not a vertex is accessible from above 
or from below, we wish to derive conditions for vertex accessibility that we can 
obtain directly from the planar embedding, rather than from a graph drawing. 
With the conditions below, we can test whether a vertex v is accessible from 
above or from below in 0{n) time. 
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Theorem 3. If the vertex v is on the outer face and has an outgoing (incoming) 
edge e that is on the outer face, then v is accessible from above (below). 

Proof. ( outline ) We can first show an equivalent definition of accessibility from 
above and from below: v is accessible from above if there is an upward planar 
drawing of G such that we can draw a monotone curve from r; to a point p that 
is above v and on the outer face. The point p need not be above the drawing 
of G. Proving that this definition is equivalent is long and largely technical. To 
prove it, we show how to take a drawing in which we can draw a monotone curve 
from V to p, and modify this drawing so that we can draw a new curve from v 
to a new point q that is above the drawing of G. 

It is then easy to see that if v has an outgoing edge on the outer face, then 
we can draw a curve from r; to a point above v on the outer face. □ 

Theorem 4. A source (sink) v is accessible from below (above) if and only if it 
is a big angle on the outer face. 

Proof. As shown by Bertolazzi et al. [2], in any upward planar drawing of G, if 
r: is a big angle on the outer face, then the angle of the outer face formed at 
v must be greater than 180°, and hence the outer face must include some area 
below V. Thus using the alternate definition of accessibility from the proof of the 
previous theorem, we can draw a monotone curve from z; to a point below v on 
the outer face, and hence v is accessible from below. 

To show the other direction, we note that if u is a source, there is only one 
face below v, and hence v must be a big angle on that face. Since v is accessible 
from below, that face must be the outer face. □ 

6 Joining Biconnected Components 

We are now ready to show how to join multiple upward planar biconnected 
components to form a larger upward planar graph. Throughout this section, 
we let Gi, . . . ,Gk, with upward planar embeddings Pi, ... ,Pk and outer faces 
Fi, . . . , Fk,he connected (not necessarily biconnected) components. We will start 
by joining the components by identifying the vertices vi € V(Gi), . . . , Vk € 
V(Gk). The resulting graph we call G, with upward planar embedding F. By 
repeating this process, we can construct any arbitrary graph, starting from its 
biconnected components. 

To help us prove our conditions for joining graphs, we first show that for all 
i, except for at most one, Vi must be on the outer face of the corresponding Gi 
in order for G to be upward planar. 

Lemma 3. If there exist more than one value of i such that Gi does not have an 
upward planar embedding with Vi on the outer face then G is not upward planar. 

Proof, (outline) We assume that we are given an upward planar drawing of G, 
and show that if we consider this drawing, restricted to the subgraph Gi, then 
Vi must be on the outer face for i > 1. □ 
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Fig. 3. Joining Gj to Gj-i and then to Gj-2 when Vj, Vj-i, and Vj-2 are all sources. 



We identify three cases that we will use to join together components to 
construct arbitrary graphs. 

Case 1: All Vi are sources, or are all sinks. 

Case 2: k = 2 and Vi is a source or a sink. 

Case 3: None of the Vi’s are sources or sinks, and are not cutvertices. 

The way in which we will use these cases is as follows: we will join together all 
components in which Vi is a source to form a larger component by using Case 
1. Similarly, we join together all components in which Vi is a sink. Using Case 
3, we join together all components in which Vi is neither a source nor a sink. 
Finally, using Case 2, we join together the three new components. In each case, 
we give necessary and sufficient conditions for the upward planarity of G. 

Theorem 5. If Vi is a source (sink) for all i, then G is upward planar if and 
only if for all j > 1, Gj has an upward planar drawing in which Vj is on the 
outer face. 

Proof. Since Vj is on the outer face and is a source, it must have an outgoing 
edge on the outer face, and so by Theorem 3, must be accessible from above. 
So we can draw Gj-i in the outer face of Gj above Vj, draw an edge from Vj 
to Vj-i and contract the edge, using Lemma 1, as illustrated in Figure 3. We 
start this procedure from the last component, and note that if Vj and Vj-\ are 
on the outer faces in their respective components, then after Vj and Vj-\ have 
been identified, the new vertex is still a source on the outer face of the new 
component. 

The other direction follows directly from Lemma 3. □ 



Theorem 6. Given two components Gi and G 2 where V\ is a source (sink), G 
is upward planar if and only if Gi has an upward planar embedding in which 
v\ is accessible from below (above), or G 2 has an upward planar embedding in 
which V 2 is accessible from above (below). 

Proof. ( outline ) We omit the proof that the condition is sufficient as the proof 
is similar to the proof above. To show the other direction, we assume that G 
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is upward planar, v\ is not accessible from below, and V 2 is not accessible from 
above. We first show that if v\ and V 2 are not cutvertices of their respective 
components, then G is not upward planar. We do this by showing that if v\ and 
V 2 are cutvertices, then Gi is drawn entirely within a single face of G 2 . We then 
consider the faces of G 2 in which Gi can be drawn, and show that in every case 
we obtain a contradiction. 

We then show that if vi is a cutvertex, we can find a subgraph of Gi in 
which vi is not a cutvertex and is not accessible from below. Similarly, if V 2 is a 
cutvertex, we can find a subgraph of G 2 in which V 2 is not a cutvertex and is not 
accessible from above. From the result above, these two subgraphs cannot be 
joined to yield an upward planar graph. Since this subgraph of G is not upward 
planar, G cannot be upward planar. □ 

Theorem 7. If for all i Vi has indegree and outdegree at least 1 and Vi is not a 
cutvertex, then G is upward planar if and only if for all i > 1, Gi has an upward 
planar embedding in which the outer face Fi is flattenable at u,. 

Proof, (outline) Again, we omit the proof that the condition is sufficient. We 
assume that we are only given two components G\ and G 2 such that Fj is 
not flattenable at Vi'. if we have more than two components such that Fj is 
not flattenable at Vi for more than one value of i, we can take any two such 
components as Gi and G 2 and show that G is not upward planar. Again, since 
vi and V 2 are not cutvertices, in any upward planar drawing of G, Gi must be 
drawn entirely within a single face of G 2 and vice versa, and in all cases, we 
obtain a contradiction and hence G cannot be upward planar. □ 

We can now give a fixed-parameter algorithm for determining when a graph 
is upward planar, with the parameter being the number of triconnected compo- 
nents and the number of cutvertices. 

Theorem 8. There is an 0(fc!8^n^ -I- 2^'^^ k\8^n)-time algorithm to test 
whether a graph G is upward planar, where n is the number of vertices, k is 
the number of triconnected components, and i is the number of cutvertices. 

Proof, (outline) Our algorithm works by splitting G into biconnected compo- 
nents, generating all possible upward planar embeddings for the biconnected 
components, and joining them together, keeping track of all possible upward 
planar embeddings for the new components. 

We can G split into biconnected components in 0{n) time [26]. For each 
biconnected component, we generate all possible upward planar embeddings, 
along with their possible outer faces, as shown in Section 3. In total, this takes 
0(k!8^n^) time, and generates at most embeddings. 

For each cutvertex v, we first join together all components in which v has 
indegree and outdegree at least one by using Theorem 7, producing the new 
component Gx . We then use Theorem 5 to join together all components in which 
u is a source, producing Gy, and all components in which u is a sink, producing 
Ga. Then, using Theorem 6, we join Gy to Gx and join Ga to the resulting 
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component, producing a new component Gy. We remove all the components 
that contained v, replace them with Gy, and continue inductively. In this step, 
we may also detect that no possible upward planar embedding exists. 

Since the conditions given by the theorems in this section only depend on 
the accessibility of vertices or on vertices being on the outer face, we can greatly 
limit the number of embeddings that we must consider. For example, if Gi is not 
on the outer face (and hence not accessible from above or from below) in the 
embeddings Fi and /2 of Gy and we later join Gy to a component that shares 
a vertex with Gi, Fi can be used to create an upward planar embedding of the 
new graph if and only if F 2 can also be used. Thus we only need to consider 
either Fi or F 2 - 

We can show that, for the *th cut vertex, the number of embeddings that we 
will produce will be less than 2^ fc!8^, and producing them will take at most 

0{2^'^ klS^n) time. Since we have i cutvertices, summing over all the steps 
for joining biconnected components gives a time of at most 0{2^'^ k^'^ klS'^n). 

Thus in total, the algorithm runs in 0(fc!8^n^ + 2^''^‘ k^''^‘ kl8^n) time. □ 



7 Conclusions and Future Work 

In this paper, we first developed a parameterized algorithm for upward planarity 
testing of biconnected graphs. This algorithm runs in 0{k\8^n^) time, where k 
is the number of triconnected components. We then showed conditions under 
which contracting an edge in an upward planar graph results in a new graph 
that is still upward planar, and we introduced a notion of vertex accessibility 
from above and below. Using these results, we then gave necessary and sufficient 
conditions for joining biconnected graphs to form a new upward planar graph. 
This allowed us to obtain a parameterized algorithm for upward planarity testing 
in general graphs. Our algorithm runs in 0(fc!8^n^ + 2^'^ kl8^n) time, where 

n is the number of vertices, k is the number of triconnected components, and ^ 
is the number of cutvertices. 

Our running time analysis contains many potential overestimations. Better 
analysis may yield a smaller running time. It would also be interesting to im- 
plement the algorithm and see how well it performs in practice. In particular, 
since the complexity analysis includes bookkeeping of embeddings that are not 
upward planar, it is very likely that the running time in practice will be much 
smaller than that given in Theorem 8. 

Another possible research direction in applying parameterized complexity to 
upward planarity testing is obtaining a parameterized algorithm that determines 
whether or not a graph has a drawing in which at most ^ of the edges point 
downward. For k = 00 , this is simply upward planarity testing. For k = 2, this 
is trivial: take any drawing of the graph in which no edge is drawn horizontally. 
Either this drawing, or the drawing that results from flipping it vertically, has 
at least half the edges pointing upward. Thus it is possible that between these 
two extremes, we may be able to obtain a parameterized result. 



A Parameterized Algorithm for Upward Planarity Testing 167 



Parameterized complexity is a fairly new area and seeks to find efficient 
solutions to hard problems. Many problems in graph drawing have been shown to 
be NP-complete, and so parameterized complexity may be able to offer solutions 
to many of these problems. Some possible parameters worth examining are the 
height, width, or area of the drawing, the maximum indegree or outdegree in a 
graph, the number of faces in a planar graph, or the number of sources or sinks. 

Acknowledgements. I gratefully acknowledge the suggestions of the anony- 
mous referees. 
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Abstract. In this paper we study efficient algorithms for computing 
equilibrium price in the Fisher model for a class of nonlinear concave 
utility functions, the logarithmic utility functions. We derive a duality 
relation between buyers and sellers under such utility functions, and use 
it to design a polynomial time algorithm for calculating equilibrium price, 
for the special case when either the number of sellers or the number of 
buyers is bounded by a constant. 

1 Introduction 

Equilibrium price is a vital notion in classical economic theory, and has been 
a central issue in computational economics. For the Fisher model, there are n 
buyers each with an initially endowed amount of cash and with a non-decreasing 
concave utility function, and there are m goods (w.l.o.g., each of a unit amount) 
for sale. At the equilibrium price, all goods are sold, all cash are spent, and the 
goods purchased by each buyer maximizes its utility function for the equilibrium 
price vector as constrained by its initial endowment. It can be viewed as a special 
case of the more widely known model of Arrow-Debreu. 

Arrow and Debreu [1] proved the existence of equilibrium price assuming 
goods are divisible. Their proof was based on the fixed point theorem and thus 
was not constructive. Gale made an extensive study for linear utility functions [7] . 
On the algorithmic side. Scarf’s pioneering work [11] pointed to the possibility 
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of obtaining the equilibrium price in the limit through an iterative combinatorial 
algorithm. However, after Scarf’s work the algorithmic study in computation of 
equilibrium price has for a long time seen no significant progress. 

Recently, Deng, Papadimitriou, and Safra [3] started a study of algorith- 
mic complexity for the equilibrium price problem in the Arrow-Debreu model. 
Complexity and algorithmic results for the equilibrium price are obtained. In ad- 
dition, an algorithmic concept of approximate equilibrium was introduced that 
led to a polynomial time approximation scheme for the equilibrium price of an 
economy with a bounded number of indivisible goods. Even though the work 
was presented for linear utility functions, it was pointed out many of the results 
hold for general concave utility functions. Still, a crucial open question remains: 
is there a polynomial time algorithm for the equilibrium price when the number 
of goods and agents are part of the input size? 

Subsequently, a fair number of explorative works have followed for equilib- 
rium and approximate equilibrium price under linear utility functions to study 
the problem with general input size. Devanur, et al. [4], obtained a polynomial 
time algorithm for a price equilibrium for the Fisher model, in which agents are 
initialed with certain amount of money. Later, on the basis of the duality type 
of algorithm proposed in [4], Jain et al. [8], Devanur and Vazirani [5], and Garg 
and Kapoor [9] showed different polynomial time approximation schemes for the 
linear utility functions in the Fisher model. Unfortunately, those results were 
for the linear utility functions, which still leaves open a gap towards the general 
concave utility functions. 

In this paper, we develop a solution for a class of concave utility functions, the 
logarithmic utility functions, by making use of an interesting duality relationship 
between the amount of cash of buyers and the amount of commodities of the 
sellers. Polynomial time algorithms are obtained when the number of sellers (or 
buyers) is bounded by a constant. 

In Section 2, we introduce a formal formulation of the problem. In Section 3, 
we derive and discuss a duality relationship between buyers and sellers. In Sec- 
tion 4, we present our polynomial time algorithms. We conclude our work with 
remarks and discussion in Section 5. 

2 A Fisher Equilibrium Problem 

We study a market consisting of a set A of n buyers and a set B of m divisible 
goods {i.e., |H| = n and \B\ = m). Let be the initial amount of endowment of 
buyer i, qj be the quantity of goods j. Though it is common to scale quantities of 
goods to the units, for clarity we keep the notation g^-’s for the goods, especially 
because of the duality relationship we are going to obtain. 

We consider a class of logarithmic utility functions: 



m 
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for each buyer z, where aij,Eij > 0 are given constants. For each i,j, define 
the functions . Clearly, the first order derivative of the utility 

function Ui with respect to goods j depends only on one variable Xij, and is in 
fact equal to j* {xij). 

The functions are non-negative and strictly decreasing in t (for t > 

—Cij). It represents the marginal utility of buyer i when the amount of j held 
hy i is t, 0 < t < Qj . In comparison, linear utility functions correspond to the 
case the marginal utility functions, are all constant. 

Given a price vector P = (pi, . . . ,Pm) > Oj each buyer i has a unique opti- 
mal allocation (^x*i(P), . . . (see, e.g., [10]), that maximizes the utility 

function Ui of buyer i: 



max Ui{Xi^i, . . . ,Xi^rn) (1) 

S.t. piXi i -t“ * ■ * “t“ Pm^i.m ~ 

Xi,j >0, V 1 < j < m 

The Fisher equilibrium problem is to find a price vector P = (pi, . . . ,Pm) 
(and the corresponding optimal allocations x*{P)) such that all money are spent 
and all goods are cleared, i.e., 

(y ie A, = e* 

\vjeS, = <lj 

Such a price vector P is called a (market) equilibrium price vector. 

We relax the last constraint of (1) to obtain a relaxed linear program. For 
each buyer i, 



max Ui{x,^i,. . . ,X^^rn) (2) 

s.t. piXiA H \-PmXi.m = ei,Xij > -€ij for all j. 

Note that the value of utility Ui approaches — oo, as any Xij approaches —Cij. 
Furthermore, the feasible region of (xi^i,Xi^ 2 , ■ ■ ■ in is bounded. It is 

clear that the maximum of Ui is achieved in some interior point of the region, 
and can be found by applying the Lagrange Multiplier Method. Let 



■ ■ ■ 5 Xi^rn, ff) — Ui{XiA, ■ ■ • ; ^z,m) p{PlXip “1“ * ’ ’ “t” PmXi.rn Cz)‘ 

Setting the derivatives to zero, we obtain 



Rj} — Ci (pi^z,l “t“ * * * ~t“ PmXi^m) — 0 
Fxi i = — VPm = 0, j = 1, 2, . . . , TO. 



“I” 



- VPm 



172 



N. Chen et al. 



The solution (x' ^(P), . . . is then: 



1 

1 



^,3 



Ci + Z^j=l ^i,jPj 
m 

j = l <^i,j 

Cli,j + 2^j=l^i,jPj 

Pj ^i=l 



’ •? — 2, . 



, m. 



Note that x'^ > — e^j-, and thus (x'^(P),... ,x'„j(P)) is indeed an interior 
point within the domain within which the utility function Ui is defined (as a 
real- valued function). 

Recall that we use (x*;^(P), . . . ,x*^(P)) to denote the optimal solution of 
( 1). In addition, in case of no ambiguity, we drop P in the notation of x' and x*. 



Proposition 1. For any i,j, if x[ j < 0, then x*j = 0. 

Proof. Let i be fixed. Assume to the contrary that there exists ji G B such that 
x(jj < 0 and x*j^ > 0. Because of the budget constraint, there must exist j 2 £ B 
such that x' > x*j^ > 0. In the following we focus only on the local allocation 
to ji and j 2 - 

First observe that one unit of ji is equivalent to Pj^/pj.^ unit of j 2 hr price. 
By definition of optimality, it would not increase the utility if x\ is decreased 
with a corresponding increase of x(^^. Therefore, the marginal utility gain by 
purchasing j 2 with one unit money must be no less than that of purchasing ji : 




Otherwise, we may decrease x' (and increase x( accordingly) to get higher 
utility for buyer i. 

Using the last inequality and the monotonicity of Mi j*, we obtain 




The inequality between the first and the last term implies that an increase in 
X* (and a corresponding decrease in x*^^) will gain buyer i a higher utility, 
which is a contradiction to the optimality of x*{P). □ 



3 Dual Relation Between Buyers and Sellers 

For the sake of presentation, we associate a seller with each goods, following [4] . 
We introduce a bipartite graph to describe the transactions between buyers and 
sellers. 

Definition 1. Give a price vector P, the transaction graph G{P) = (A, B, E) 
consists of vertices A, B, which represent the collection of buyers and sellers, 
respectively, and the edge set E. For any i € A, j G B, (i,j) G E if and only if 
x*^.(P) >0. 
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For i € A, denote its neighborhood by r{i) = {j £ B \ (i,j) € E}. For j € B, 
denote its neighborhood by <?(j) = {i € A \ (i,j) G E}. 

It follows immediately from Proposition 1 that, for any price vector, we can 
compute the incident edge set to i, E{i), as follows: First compute the optimal 
solution x' of (2). Then, drop the non-positive variables in the solution. For 
the remaining variables, repeat the above procedure until all solutions are pos- 
itive. The remaining variables (with positive solutions) yield the corresponding 
neighborhood set E(i). We summarize in the following proposition. 

Proposition 2. If Xij > 0 in the last iteration, then x* ^ > 0. 

Note that in each iteration, at least one variable is dropped. Thus there can 
be at most m iterations. Hence, for any given price vector the above algorithm 
computes E{i) in OfrrA) time. 

Now, for any given graph G, we define for each buyer i the “bias factor” Ai 
as follows: 



A. = 






E 



jer(i) “i.i 



(3) 



Note that in the last iteration of the algorithm to determine the transaction 
graph, we have rj = 1/Xi > 0, and for any j G r(i), Xij = ^ - = 

Ti^Ai ~ Cij > 0- 

At any time during the iteration, define the current value of I /77 as the value of 



G + 'l2jer{i) G,jPj 



E 



ier(i) 



where E{i) is the set of remaining vertices in B at that time. We argue that 
the value of rj can only increase in the iterative process. Note that, after each 
iteration except the last one, some variables x' ^ become non-positive and thus 
j is dropped from Eii). That means the new value of 77 must have become larger 
to decrease from being positive to zero or negative. 

For any j ^ E{i), when j is dropped from Efi) during an iteration, — < 

0 for the value of rj at that time. Since the value of 77 can only increase (reaching 
l/Xi at the end), we conclude that ^Ai — etj < 0. The above discussions lead 
to the following characterization of the transaction graph. 



Lemma 1. Given a priee vector P > 0, the transaction graph G{P) satisfies 
the following conditions: 

(a) Vj G P{i), Xi > ^Pj, and Vj ^ P{i), Xi < ^Pj- i-e. P{i) = {j G 

B \ K> ^Pj}- 

(b) Vj G P{i), xl^(P) = - G,. 

Moreover, of all the subsets of A, P(i) is the only subset that can satisfy Equa- 
tion (3) and Condition (a) . 



174 



N. Chen et al. 



Proof. Conditions (a) and (b) follow from the above discussion. We only 
need to prove the uniqueness of C(i). Assume without loss of generality that 
^Pi < ••• < Suppose there is another subset r'{i) C A satisfying 

Equation (3) and Condition (a). Assume P(i) = ,j} and w.l.o.g. assume 

r'{i) = , j, . . . ,j + k},k>0 (Discussion for the case fc < 0 is symmetric.). 

Let 



A,; = 



+ J2jer{i) ^ijPj 



E 



A' = 






j€r{i) “hi 



E 



ier'(i) “hi 



Thus, 



di.j 

Note that A,- < 



Pj <K< 






Pj+i, 



^hi+fe „ ^ <r ^hi+fe+i 

Pj-\-k ^ '^i ^ 









Pj+fc+1- 



rft+1 



indicates that 



Gi + EjGr(i)u{i+i} ^ijPj , £i,i+i 

< Pj+i 



E 



jer(i)u{i+i} 



j' + l 



Since ^Pj+i < ^Pj +2 <■■■< we have 



Bi + J2jer{i)u{j + 1 .... ,j+k} ^i,jPj Gi,j+k 

< Pj + k- 



E 



jer(i)u{i+i....,i+fc} “*.i 



ki,j+k 



The left hand side of the above inequality is A', a contradiction to pj+k < 



A'. 



□ 



Lemma 1 leads to an improved algorithm to compute A^ and the transaction 
graph G{P) for any given price vector P. 



Lemma 2. For any given price vector P > 0 and buyer i, \i and P{i) can 
be computed in 0(m log m) time. Therefore, we can compute G(P) in time 
0{nm log(m)). 

Proof. By Lemma 1, we know that Vj € T(i), Ai > ^ff^Pj, and Vj ^ T(i), 
Aj < lf~Pj- Thus we design an algorithm as follows. 

First, we calculate all ^f^Pj, J = 1, • ■ • , ur, and sort them into non-decreasing 
order. Thus, the set P{i) has at most m different choices. For each candidate 
set of r{i), calculate the current “bias factor” Ai according to Equation (3), and 
check whether F{i) is equal to the set {j £ B \ Xi > ^Pj}. The computation 
of Xi can be done in 0(1) steps per candidate set, and thus the dominating 
computation time is in the sorting of j = 1,2, ...,m, and the Lemma 

follows. □ 



From now on in this section, we restrict our discussion to the equilibrium 
price vector. 
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Lemma 3. For the equilibrium price vector P, we have 

Pj = , Y- 



(4) 



and <P{j) = {i ^ A \ > Pj}. 



Proof. By Lemma 1, we have (i,j) € E OijXi > CijPj, thus <P{j) = {i € 
A I > pj}. And also from Lemma 1, V(t,j) £ E, Xij = — Cij. 

According to the market clearance condition, qj = j ■ Thus qj = 

_ Eigji(j) 

9j+Z]ig4i(j) ' 



E 



zGS>(i)v Pj 



^ Vi 



□ 



Lemma 3 presents a description of the optimal behavior of sellers. It allows 
use to obtain an algorithm to calculate pj and d>{j) on the basis of A, where 
= (Ai, . . . , A„). 



Lemma 4. Assume we know the vector A, then there exists the unique solution 
Pj and 'P{j) for seller j, and it can he computed in O(nlogn) time. 

Proof. By Lemma 3, <P{j) = {i £ A \ ^Ai > pj}. We first sort all ^Ai, 
i = 1, . . . , n, in a non-increasing order. Then l?(j) has at most n choices. For 
each candidate of calculate pj according to Equation (4), where the sums 
can be obtained by preprocessing the prefix sum series in 0{m) time, ^(j) is 
then determined by whether it is equal to {f £ A | °ff^Xi > Pj}. □ 

A dual relation is now established between the price vector P and the “bias 
factor” A: 



P ^ A 



Pj = 



E 






^j + 'l2i^>P{j) ^i,j 
a, 



■£>■ Xi — 



+ EjGr(i) ^i,jPj 



E, 



jer{i) 



F{j) = {i £ A I — Ai > Pj} ££ P{i) = {j £ B \ Xi> —pj} 



Although the “bias factor” was first defined by the price vector (Equation 
(3)), we can treat it as a natural parameter for buyers as a dual to the price 
vector P for sellers: Each seller j decides which collection of buyers to sell his 
goods, according to the quantity qj and the buyers’ “bias factor” A (dually, 
each buyer i decides which collection of goods to buy, according to his total 
endowment and the sellers’ price P). This property will be used in the next 
section to design an algorithm for finding the equilibrium solution. 
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4 Solution for Bounded Number of Buyers or Sellers 



Our solution depends on the following lemma. 

Lemma 5. Given a bipartite graph G{A,B), if for all i € A,j G B, r{i) 0, 
^0) 7 ^ 0; then the linear equation system 



A. 



_ ei+Ejgr(i) ■ _ 1 

_ 5 L , . . . 



p-i = „ _i_v- — : — I J = 

is non-degenerate. 

Proof. From the equations, we have 



A,: = 



Ci + J2jer{i) GjPj 



E 



jGr(z) 






E 



E/cG<Zi(j) ^k,j^k 



+ EfcG<f(i) 

1 X ^ / X ^ 



67 



E E 



Ezer(i) Eier(j) + Eie<s(j) 

Now we change the order of dummy j, k is the right side of the equation: 



(5) 



1 



^k,j ^i,j 



-- + E I V 

Z.ier(i) ^3,1 a^,l k-.rmr{k)^it \j-.j€r(i)nr{k) 



Ei^r{i) EiGr(i) k-.r(,^{k)^(i> U:jGr«nr(fc) + S«63>a) 



+ 



i.e. 



E 

Oer(*) 



^i,j I — ^ 7 ' 



E ^4 E „ 

k-.r(i)nrik)^0 \j:j£r{i)nr{k) 



Move Afc to the left, we get an equation system of A: F"A = e, here F = 
[fi,k]my.m IS a, m X m matrix, e = [ei, . . . , e^]^, where 






fid - E “hi E o + V e; ■ ' 
\jem J rjem 



E 



. /*.fc = 0 , 



j:j&r{i)nr{k) 



ij + ^ 1,3 



if r(i)nr(fc) yf 0,/cyf t 
if F(i)nF(fc) = 0 
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Now we show that VA:, \fiM < fk,k- 



fk,k ^ ^ 

i'.i^k 



= 

\j(^rik) 

= 

\j&r(k) 

= 

\jGr(fe) 



E E 

z:r(2)nr(fc)^0 \j-.j£r(i)r\r{k) 



^k,j 



9j + J2ie<p(j) ^1,3 



j:j^r{k) \i:i^<P(j) hj 



- E 



^k,j 



E 






•.j^r{k) hj 



— ^ Ofej I 1 

jer(fe) \ 

= E 



y- ■ 



G3>0) ^i,3 



jer(k) 



^3 + J2ie<P(j) ^1,3 

>0 

'?i + }^ie${j) ^1,3 



So rank{F) = m, the linear equation system is non-degenerate. □ 



Proposition 3. Given the transaction graph G, P and A can be computed in 
0{{m + n)^) time. 

Proof. We establish (m -|- n) linear equations of 7l and P from graph G: 



A,; = 



St+Ejgr(i) ' 
Ejgrci) “ 



;jPj 



Pi = 









g*(j) ' 



,i= 1,... ,n 
j = 1,... ,m 



By Lemma 5, the set of equations is non-degenerate for e, q corresponding to the 
equilibrium solution. It can be computed in 0{{m + n)^) time. □ 



The number of bipartite graphs with m and n vertices on each side has in 
general 2"*” different choices, which is exponential for both m and n. In the 
following, we show how to reduce this number to polynomial, when either m or 
n is bounded by a constant. 



Lemma 6. There is an algorithm solving the Market equilibrium problem with 
logarithmic utility functions in 0{{m + -|- n)^) time. 

Proof. Due to Proposition 3, we only need to prove there are at most 0((jn + 
2"*mn)"*) different possibilities of graph G, or equivalently, different choices of 
(T(l),... ,T(n)), and that it takes 0{{m + 2™mn)^’”“''^) time to generate all 
these graphs. 
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For each i A and subset A C B, let denote the system of m inequalities 



^i,kPk 
. a. i. 



< Sj-Pj. Vj ^ Z\ 



(6) 



If a (r(l), 1^(2), . . . ,r{n)) (where r{i) C i?) is a possible candidate for the 
transaction graph, then by Lemma 3.1, there must exist some P € such that 
^i,r{i) holds for every i & A. 

For each i G A, subset A C B and j G A, let fi^A,j{pi,P2, ■ ■ ■ ,Pm) denote 
the following linear function in variables p\, ■ ■ ■ , Pm '■ 



Ci + X/fcGZl ^i,kPk 
"^keA ^i,k 



^i,3 

O-ij 



Pj- 



Let N be the number of all triplets {i,A,j), then N < nm2™. Let P € 
be an unknown price vector, but that we know the answer ^(P) € {>,<}'^ to 
all the binary comparison queries “fi^A,j{P) ■ 0”- Then for each i we can easily 
determine all the subsets A Q B such that the system of inequalities Ii,A holds. 
By Lemma 3.1, there is in fact exactly one such A, which we call P{i). In this 
fashion, each possible ^{P) leads to one candidate graph G{P), as specified by 
(F(l), . . . , F(n)); this computation can be done in time 0{N). 

To prove the Lemma, it suffices to show that there are at most 0{N"^) distinct 
^(P)’s, and that they can be generated in time In R"*, consider the 

N + m hyperplanes fi,A,j(pi,P2, ■ ■ ■ ,Pm) = 0 and pi = Q. They divide the space 
into regions depending on answers to the queries “fi,A,j(pi,P2, ■ ■ ■ ,Pm) > 0?” 
and “pi > 0?”. For any region in K™, all the points P in it share the same ^(P). 
It is well known that there are at most 0{{N + m)™) regions. We can easily 
enumerate all these regions and compute their ^{P)’s in time 0{{N + rn)'^{N + 
m)™+i), by adding hyperplanes one at a time and updating the regions using 
linear programming in K"*. □ 



Lemma 6 solves the Market equilibrium problem in the price space K"*. By 
the dual relationship, we can also solve the problem in the “bias factor” space 
M" in an almost verbatim manner. 



Lemma 7. There is an algorithm solving the Market equilibrium problem with 
logarithmic utility functions in 0{{m + 2”mn)^'"^^(m + n)^) time. 

From Lemma 6 and Lemma 7, we have the following conclusion. 

Theorem 1. If the number of buyers, or goods, is bounded by a constant, then 
there is a polynomial time algorithm solving the Market equilibrium problem with 
logarithmic utility functions. 
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5 Conclusion and Discussion 

In this paper, we have derived a duality relation for the market equilibrium 
problem with logarithmic utility functions, and use it to develop a polynomial 
time algorithm when the number of either the buyers or the sellers is bounded by 
a constant. The techniques developed may be of some help for general concave 
utility functions. 

We note that, recently, Devanur and Vazirani made some extension to de- 
sign a PTAS for another special type of value functions for the buyers as they 
considered the concave utility function very difficult to deal with by their ap- 
proach [6]. At the same time, Codenotti and Varadarajan [2] studied how to 
compute equilibrium price for Leontief utility function. 
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Abstract. We study approximation algorithms and hardness of approx- 
imation for several versions of the problem of packing Steiner trees. For 
packing edge-disjoint Steiner trees of undirected graphs, we show APX- 
hardness for 4 terminals. For packing Steiner-node-disjoint Steiner trees 
of undirected graphs, we show a logarithmic hardness result, and give an 
approximation guarantee of 0(^/n log n), where n denotes the number 
of nodes. For the directed setting (packing edge-disjoint Steiner trees of 
directed graphs), we show a hardness result of and give an 

approximation guarantee of 0(m5^'), where m denotes the number of 
edges. The paper has several other results. 



1 Introduction 

We study approximation algorithms and hardness (of approximation) for several 
versions of the problem of packing Steiner trees. Given an undirected graph 
G = (y, E) and a set of terminal nodes T C V, & Steiner tree is a connected, 
acyclic subgraph that contains all the terminal nodes (nonterminal nodes, which 
are called Steiner nodes, are optional). The basic problem of Packing Edge- 
disjoint Undirected Steiner trees (PEU for short) is to find as many edge-disjoint 
Steiner trees as possible. Besides PEU, we study some other versions (see below 
for details). 

The PEU problem in its full generality has applications in VLSI circuit design 
(e.g., see [11,21]). Other applications include multicasting in wireless networks 
(see [7]) and broadcasting large data streams (such as videos) over the Internet 
(see [14]). There is significant motivation from the areas of graph theory and 
combinatorial optimization. Monger’s theorem on packing edge-disjoint s, t-paths 
[5] corresponds to the special case of packing edge-disjoint Steiner trees on two 
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Dept, of Comb, and Opt., University of Waterloo. 



S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 180-191, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 



Hardness and Approximation Results for Packing Steiner Trees 



181 



terminal nodes (i.e., T = {s,t}). Another special case is when all the nodes are 
terminals (i.e., T = V). Then the problem is to find a maximum set of edge- 
disjoint spanning trees. This topic was studied in the 1960’s by graph theorists, 
and a min-max theorem was developed by Tutte and independently by Nash- 
Williams [5]. Subsequently, Edmonds and Nash- Williams derived such results in 
the more general setting of the matroid intersection theorem. One consequence 
is that efficient algorithms are available via the matroid intersection algorithm 
for the case oi T = V . (Note that most problems on packing Steiner trees are 
NP-hard, so results from matroid optimization do not apply directly.) A set 
of nodes S is said to be X-edge connected if there exist A edge-disjoint paths 
between every two nodes of S. An easy corollary of the min-max theorem is 
that if the node set V is 2fc-edge connected, then the graph has k edge-disjoint 
spanning trees. Recently, Kriesell [19] conjectured an exciting generalization: 
If the set of terminals is 2fc-edge connected, then there exist k edge-disjoint 
Steiner trees. He proved this for Eulerian graphs (by an easy application of 
the splitting-off theorem). Note that a constructive proof of this conjecture may 
give a 2-approximation algorithm for PEU. Jain, Mahdian, and Salavatipour [14] 
gave an approximation algorithm with guarantee (roughly) ^ . Moreover, using 
a versatile and powerful proof technique (that we will borrow and apply in the 
design of our algorithms), they showed that the fractional version of PEU has 
an a-approximation algorithm if and only if the minimum-weight Steiner tree 
problem has an a-approximation algorithm (Theorem 4.1 in [14]). The latter 
problem is well studied and is known to be APX-hard. It follows that PEU is 
APX-hard (Corollary 4.3 in [14]). Kaski [16] showed that the problem of finding 
two edge-disjoint Steiner trees is NP-hard, and moreover, problem PEU is NP- 
hard even if the number of terminals is 7. Frank et al. [8] gave a 3-approximation 
algorithm for a special case of PEU (where no two Steiner nodes are adjacent). 
Very recently Lau [20], based on the result of Frank et al. [8], has given an 0(1) 
approximation algorithm for PEU (but Kriesell’s conjecture remains open). 

Using the fact that PEU is APX-hard, Floreen et al. [7] showed that packing 
Steiner-node-disjoint Steiner trees (see problem PVU defined below) is APX- 
hard. They raised the question whether this problem is in the class APX. Also, 
they showed that the special case of the problem with 4 terminal nodes is NP- 
hard. 

Our Results: 

We use n and m to denote the number of nodes and edges, respectively. The 
underlying assumption for most of our hardness results is PyfNP. 

• Packing Edge Disjoint Undirected Steiner Trees (PEU )\ For this setting, we 
show that the maximization problem with only four terminal nodes is APX- 
hard. This result appears in Section 3. (An early draft of our paper also proved, 
independently of [16], that finding two edge-disjoint Steiner trees is NP-hard.) 

• Packing Vertex Capacitated Undirected Steiner Trees (PVCU/' We are given 
an undirected graph G, a set T C U of terminals, and a positive vertex capacity 
Cy for each Steiner vertex v. The goal is to find the maximum number of Steiner 
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trees such that the total number of trees containing each Steiner vertex v is at 
most Cy . The special case where all the capacities are 1 is the problem of Packing 
Vertex Disjoint Undirected Steiner Trees (PVU for short). Note that here and 
elsewhere we use “vertex disjoint Steiner trees” to mean trees that are disjoint 
on the Steiner nodes (and of course they contain all the terminals). We show 
essentially the same hardness results for PVU as for PEU, that is, finding two 
vertex disjoint Steiner trees is NP-hard and the maximization problem for a con- 
stant number of terminals is APX-hard. For an arbitrary number of terminals, 
we prove an f?(log n)-hardness result (lower bound) for PVU, and give an ap- 
proximation guarantee (upper bound) of 0(i/n log n) for PVCU, by an LP-based 
rounding algorithm. This shows that PVU is significantly harder than PEU, and 
this settles (in the negative) an open question of Floreen et al. [7] (mentioned 
above). These results appear in Section 3. Although the gap for PVU between 
our hardness result and the approximation guarantee is large, we tighten the 
gap for another natural generalization of PVU, namely Packing Vertex Capac- 
itated Priority Steiner Trees (PVCU-priority for short), which is motivated by 
the Quality of Service in network design problems (see [4] for applications) . For 
this priority version, we show a lower-bound of J7(nS“'^) on the approximation 
guarantee; moreover, our approximation algorithm for PVCU extends to PVCU- 
priority to give a guarantee of 0(n5+'^); see Subsection 3.1. (Throughout, we use 
e to denote any positive real number.) 

• Packing Edge/Vertex Capacitated Directed Steiner Trees (PECD and 
PVCDj; Consider a directed graph G{V,E) with a positive capacity Ce for 
each edge e, a set T C U of terminals, and a specified root vertex r, r £ T. A 
directed Steiner tree rooted at r is a subgraph of G that contains a directed path 
from r to t, for each terminal t G T. In the problem of Packing Edge Capacitated 
Directed Steiner trees (PECD) the goal is to find the maximum number of di- 
rected Steiner trees rooted at r such that for every edge e G E the total number 
of trees containing e is at most Cg. We prove an I7(m3~'^)-hardness result even for 
unit-capacity PECD (i.e., packing edge-disjoint directed Steiner trees), and also 
provide an approximation algorithm with a guarantee of 0{ra^'^^). Moreover, 
we show the NP-hardness of the problem of finding two edge-disjoint directed 
Steiner trees with only three terminal nodes. We also consider the problem of 
Packing Vertex Capacitated Directed Steiner trees (PVCD), where instead of 
capacities on the edges we have a capacity c„ on every Steiner node v, and the 
goal is to find the maximum number of directed Steiner trees such that the num- 
ber of trees containing any Steiner node v is at most Cy. For directed graphs, 
PECD and PVCD are quite similar and we get the following results on the ap- 
proximation guarantee for the latter problem: a lower-bound of (even 

for the special case of unit capacities), and an upper bound of 0(n5+'^). These 
results appear in Section 2. 

In summary, with the exception of PVCU, the approximation guarantees 
(upper bounds) and hardness results (lower bounds) presented in this paper and 
the previous works [7,14,16,20] are within the same class with respect to the 
classification in Table 10.2 in [1]. 
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Comments: 

Several of our proof techniques are inspired by results for disjoint-paths problems 
in the papers by Guruswami et al. [9] , Baveja and Srinivasan [2] , and Kolliopoulos 
and Stein [17]. (In these problems, we are given a graph and a set of source-sink 
pairs, and the goal is to find a maximum set of edge/node disjoint source-sink 
paths.) To the best of our knowledge, there is no direct relation between Steiner 
tree packing problems and disjoint-paths problems - neither problem is a special 
case of the other one. (In both problems, increasing the number of terminals 
seems to increase the difficulty for approximation, but each Steiner tree has 
to connect together all the terminals, whereas in the disjoint-paths problems 
the goal is to connect as many source-sink pairs as possible by disjoint paths.) 
There is a common generalization of both these problems, namely, the problem 
of packing Steiner trees with different terminal sets (given I sets of terminals 
Ti, T 2 , . . . , T|, £ polynomial in n, find a maximum set of edge-disjoint, or Steiner- 
node-disjoint, Steiner trees where each tree contains one of the terminal sets Ti 
(1 < i < i)). We chose to focus on our special cases (with identical terminal sets) 
because our hardness results are of independent interest, and moreover, the best 
approximation guarantees for the special cases may not extend to the general 
problems. 

For the problems PVCU, PVCU-priority, PECD, and PVCD, we are not 
aware of any previous results on approximation algorithms or hardness results 
other than [7], although there is extensive literature on approximation algo- 
rithms for the corresponding minimum-weight Steiner tree problems (e.g., [3] 
for minimum-weight directed Steiner trees and [12] for minimum-node-weighted 
Steiner trees). 

Due to space constraints, most of our proofs are deferred to the full version 
of the paper. Very recently, we have obtained new results that substantially 
decrease the gap between the upper bounds and the lower bounds for PVU; 
these results will appear elsewhere. 



2 Packing Directed Steiner Trees 

In this section, we study the problem of packing directed steiner trees. We show 
that Packing Vertex Capacitated Directed Steiner trees (PVCD) and the edge 
capacitated version (PECD) are as hard as each other in terms of approximabil- 
ity (similarly for the unit capacity cases). Then we present the hardness results 
for PECD with unit capacities (i.e., edge-disjoint directed case), which immedi- 
ately implies similar hardness results for PVCD with unit capacities (i.e., vertex- 
disjoint directed case). We also present an approximation algorithm for PECD 
which implies a similar approximation algorithm for PVCD. The proof of the 
following theorem is easy. The idea for the first direction is to take the line graph, 
and for the second one is to split every vertex into two adjacent vertices. 



Theorem 1. Given an instance I = {G{V, E),T C V,k) of PEGD (of PVGD), 
there is an instance I' = {G' {V , E'),T' C V,k) of PVGD (of PEGD) with 
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|G'| = poly{\G\), such that I has k directed Steiner trees satisfying the capacities 
of the edges ( vertices ) if and only if I' has k directed Steiner trees satisfying the 
capacities of the vertices (edges). 

2.1 Hardness Results 

First we prove that PVCD with unit capacities is NP-hard even in the simplest 
non-trivial case where there are only three terminals (one root r and two other 
terminals) and we are asked to find only 2 vertex-disjoint Steiner trees. The 
problem becomes trivially easy if any of these two conditions is tighter, i.e., if 
the number of terminals is reduced to 2 or the number of Steiner trees that we 
have to find is reduced to 1. If the number of terminals is arbitrary, then we show 
that PVCD with unit capacities is NP-hard to approximate within a guarantee 
of 0{n3~'^) for any e > 0. The proof is relatively simple and does not rely on the 
PCP theorem. Also, as we mentioned before, both of these hardness results carry 
over to PECD. For both reductions, we use the following well-known NP-hard 
problem (see [10]): 

Problem: 2DIRPATH: 

Instance: A directed graph G{V,E), distinct vertices Xi,yi,X 2 ,y 2 S V. 
Question: Are there two vertex-disjoint directed paths, one from xi to y\ and 
the other from X 2 to 1/2 in G? 

Theorem 2. Given an instance I of PVCD with unit capacities and only three 
terminals, it is NP-hard to decide if it has 2 vertex-disjoint directed Steiner trees. 

From Theorems 1 and 2, it follows that: 

Theorem 3. Given an instance I of PECD with unit capacities and only three 
terminals, it is NP-hard to decide if it has 2 edge-disjoint directed Steiner trees. 

Now we show that, unless P=NP, any approximation algorithm for PVCD 
with unit capacities has a guarantee of 12(713“*^). 

Theorem 4. Given an instance of PVCD with unit capacities, it is NP-hard to 
approximate the solution within 0(ns~'^) for any e > 0. 

Proof. We use a reduction from the 2DIRPATH problem. Our proof is inspired 
by a reduction used in [9] for the edge-disjoint path problem. Assume that 
I = {G,Xi,yi,X 2 ,y 2 ) is an instance of 2DIRPATH and let e > 0 be given. 
We construct a directed graph El . First we construct a graph G' whose underly- 
ing structure is shown in Figure 1. For N = |V(G)|^/'^, create two sets of vertices 
A = {oi, . . . , ajvj and B = (hi, . . . , bjvj. In the figure, all the edges are directed 
from top to bottom and from left to right. For each grey box, there is a vertex at 
each of the four corners, and there are two edges, from left to right and from top 
to bottom. This graph may be viewed as the union of N vertex-disjoint directed 
trees T\, . . . ,Tf^, where Ti is rooted at Oi and has paths to all the vertices in 
B — {bi}. Each tree Ti consists of one horizontal path which is essentially the 
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Fig. 1. Construction of H: each grey box will be replaced with a copy of G 



ith horizontal row above and starts with a^, . . . and ends in together 

with — 1 vertical paths Pj (1 < j i < N), such that each of these vertical 
paths branches out from the horizontal path, starting at vertex Vj and ending 
at vertex bj G B. Each vertex is in the ith horizontal row, and is the start 
vertex of a vertical path that ends at bj \ note that there are no vertices v\. Also, 
note that each grey box corresponds to a triple where the box is in the 

ith horizontal line and is in the vertical path that starts at Vj and ends at bj] 
the corner vertices of the grey box are labeled s\j.i\j.P%,q% for the left, right, 
top, and bottom corners, respectively. More specifically, for T\ the horizontal 
and vertical paths &ve\ . . . , 

and Pj = Vj , bj , for 2 < j < N. For T2 the horizontal and vertical paths 
are: H'^ = 02, u?, s§i, tgi, . . . , • ■ • > and Pj = Vj,p 1 ^,q]^,bj (for 

1 < j 2 < N), and P^ = b^. In general, for Tp. 



^ . r,\'£ -f-i -f-'i ■f-'i 

n — a^, 5 * 5 ^2 5 *{z+l)2’ ^(i+l)2’ ■ ■ ■ ’ *A^2’ ^3’ • * * ’ 

2 2 J.1 2 j~z 2 

'®(i+l)(Ar-l)i ^(i+l)(AT-l)) ■ • ■ ) ^N{N-1)Pn{N-1)^''^N^ 

For j ^i< N- p; = and 

P^ = bp[ . 



Graph H is obtained from G' by making the following modifications: 

— For each grey box, with corresponding triple say and with vertices 

Sij,tlj,Pj^,q^^, we first remove the edges and then we place a 

copy of graph G and identify vertices xi,yi,X 2 , and j /2 with and 

gjj, respectively. 

— Add a new vertex r (root) and create directed edges from r to Oi, . . . , ajy. 

— Create N new directed edges for 1 < i < A^. 
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The set of terminals (for the directed Steiner trees) is i? U {r} = 
{6i, . . . , r}. The proof follows from the following two claims (we omit their 

proofs) . 

Claim 1: If / is a “Yes” instance of 2DIRPATH, then H has N vertex-disjoint 
directed Steiner trees. 

Claim 2: If / is a “No” instance, then H has exactly 1 vertex-disjoint directed 
Steiner tree. 

The number of copies of G in the construction of H is 0{N^) where N = 
|Y(G)|^/*^. So the number of vertices in H is By Claims 1 and 2 it is 

NP-hard to decide if H has at least N or at most one directed Steiner trees. 
This creates a gap of C(n3“'^). □ 

For PECD with unit capacities we use a reduction very similar to the one 
we presented for PVCD. The only differences are: (i) the instance that we use 
as the building block in our construction (corresponding to graph G above) 
is an instance of another well-known NP-hard problem, namely edge-disjoint 
2DIRPATH (instead of vertex-disjoint), (ii) the parameter N above is \E{G)\^^’^ . 
Using this reduction we can show: 

Theorem 5. Given an instance of PECD with unit capacities, it is NP-hard to 
approximate the solution within for any e > 0. 



2.2 Approximation Algorithms 

In this section we show that, although PECD is hard to approximate within 
a ratio of there is an approximation algorithm with a guarantee 

of 0(m2+'^) (details in Theorem 7). The algorithm is LP-based with a simple 
rounding scheme similar to those in [2,17]. The main idea of the algorithm is to 
start with one of the known approximation algorithms for finding a Minimum- 
weight Directed Steiner Tree. Using this and an extension of Theorem 4.1 in [14], 
we obtain an approximate solution to the fractional version of PECD. After that, 
a simple randomized rounding algorithm yields an integral solution. A similar 
method yields an approximation algorithm for PVCD that has a guarantee of 
0(n5+'^). 

We may formulate PECD as an integer program (IP). In the following, T 
denotes the collection of all directed Steiner trees in G. 

maximize 

subject to Ve € if : — G (1) 

'iE^T- xfG{0,1} 

The fractional packing edge capacitated directed Steiner tree problem (frac- 
tional PECD, for short) is the linear program (LP) obtained by relaxing the 
integrality condition in the above IP to xp > 0. For any instance / of the (in- 
tegral) packing problem, we denote the fractional instance by //. The proof of 
Theorem 4.1 in [14] may be adapted to prove the following: 
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Theorem 6. There is an a -approximation algorithm for fractional PECD if and 
only if there is an a- approximation algorithm for the minimum (edge weighted) 
directed Steiner tree problem. 

Charikar et al. [3] gave an 0(n'^)-approximation algorithm for the minimum- 
weight directed Steiner tree problem. This, together with Theorem 6 implies: 

Corollary 1. There is an 0(rP)~ approximation algorithm for fractional PECD. 

The key lemma in the design of our approximation algorithm for PECD is 
as follows. 

Lemma 1. Let I be an instance of PECD, and let ip* be the (objective) value of a 
(not necessarily optimal) feasible solution : F G T) to I j such that the num- 
ber of non-zero Xp ’s is polynomially bounded. Then, we can find in polynomial 
time, a solution to I with value at least D(inax{ip* / -^/m, min{(/?*^/m, ip*}}). 

Theorem 7. Let L be an instance of PECD. Then for any e > 0, we can find in 
polynomial time a set of directed Steiner trees (satisfying the edge capacity con- 
straints) of size at least D{max{ipf /m^ , min{(/3^/m^“'''^, ipf/m'^^'^}}), where 
ipf is the optimal value of the fractional instance //. 

Proof. Let If he the fractional instance. By Corollary 1, we can find an approx- 
imate solution ip* for If such that p* > cpf j ml 1"^ for some constant c and the 
given e > 0. Furthermore, the approximate solution contains only a polynomial 
number of Steiner trees with non-zero fractional values (this follows from the 
proof of Theorem 6 which is essentially the same as Theorem 4.1 in [14]). If we 
substitute p* in Lemma 1 we obtain an approximation algorithm that finds a 
set T' of directed Steiner trees such that T' has the required size. □ 

For PVCD we do the following. Given an instance / of PVCD with graph 
G{V, E) and TCP (with jPj = n and \E\ = m), we first use the reduction pre- 
sented in Theorems 1 to produce an instance /' of PECD with graph G'{V ,E') 
and T' CV' . By the construction of G' we have jP'j = 2|P| = 2n and there are 
only n edges in E' with bounded capacities (corresponding to the vertices of G) . 
Therefore, if we use the algorithm of Theorem 7, the number of bad events will 
be n, rather than m. Using this observation we have the following: 

Theorem 8. Let I be an instance of PVCD. Then for any e > 0 we can find 
in polynomial time a set of directed Steiner trees (satisfying the vertex capacity 
constraints) of size at least I2(ma,x{pf , min{:/3^/n^+'^, Pf /n'^^^}}), where 
Pf is the optimal value of the fractional instance If. 



3 Packing Undirected Steiner Trees 

For packing edge-disjoint undirected Steiner trees (PEU), Jain et al. [14] showed 
that the (general) problem is APX-hard, and Kaski [16] showed the special case 
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of the problem with only 7 terminal nodes is NP-hard. Here we show that PEU is 
APX-hard even when there are only 4 terminals. In an early draft of this paper we 
also showed, independently of [16], that finding two edge-disjoint Steiner trees is 
NP-hard. Both of these hardness results carry over (using similar constructions) 
to PVU. The following observation will be used in our proofs: 

Observation 2. For any solution to any of our Steiner tree paeking problems, 
we may assume that: (1) In any Steiner tree, none of the leaves is a Steiner node 
(otherwise we simply remove it). (2) Every Steiner node with degree 3 belongs 
to at most one Steiner tree. 

Using this observation, the proof of NP-hardness for finding two edge-disjoint 
Steiner trees implies the following theorem. 

Theorem 9. Finding 2 vertex- disjoint undirected Steiner trees is NP-hard. 

Theorem 10. PEU is APX-hard even if there are only 4 terminals. 

Proof. We use a reduction from Bounded 3-Dimensional Matching (B3DM). As- 
sume that we are given three disjoint sets X, Y, Z (each corresponding to one 
part of a 3-partite graph G), with jXj = |T| = \Z\ = n, and a set E C X xY x Z 
containing m triples. Furthermore, we assume that each vertex in X U Y U Z 
belongs to at most 5 triples. It is known [15] that there is an absolute constant 
Co > 0 such that it is NP-hard to distinguish between instances of B3DM where 
there is a perfect matching (i.e., n vertex-disjoint triples) and those in which 
every matching (set of vertex-disjoint triples) has size at most (1 — eo)n. As- 
sume that Xi, . . . ,Xn, 2 / 1 , ... , 2/n, and zi, . . . , Zn are the nodes of X, Y, and Z, 
respectively. We construct a graph H which consists of: 

— 4 terminals ty, G, and ty^,. 

— Non-terminals x\, . . . , yi, . . . , ?/„, and zi, . . . , (corresponding to the 

nodes in A, Y, Z), x'^,..., 2/i, • ■ • , 2/(n-„, and z(, . . . , 

— Two non-terminals U and W. 

— Edges GaJi, tyyi, t^zi, z'z*, and ty^zi, for 1 < i < n. 

— Edges t^x'i, x^y[, x'^U, y[ty, and y[ty^, for 1 < z < m - n. 

— m — n parallel edge from W to tz- 

— For each triple Cg = XiyjZk G E, 3 non-terminals v^,v^,v') and the following 
edges: vl,Xi, vjzk, v^vf, v^yj, v^U, and v^W. 

See Figure 2. We can show that (a) [completeness] if G has a perfect matching 
then H has m edge-disjoint Steiner trees and (b) [soundness] if every matching 
in G has size at most (1 — eo)n then H has at most (1 — ei)m edge-disjoint 
Steiner trees, for ei > eo/110. □ 

The constant 110 in the above theorem is not optimal. We can find explicit 
lower bounds for the hardness of PEU with constant number of terminals using 
the known hardness results for /cDM, for higher values of k. For instance, Kazan 
et al. [13] proved that 4DM (with upper bounds on the degree of each vertex) 
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Fig. 2. Construction with 4 terminals from B3DM 



is hard to approximate within a factor Using this and a reduction similar 
to the one presented in Theorem 10 it seems possible to show that PEU with 
5 terminals is hard to approximate within a factor (1 + jciho)- The proof of 
Theorem 10 extends to give the next result. 

Theorem 11. PVU is APX-hard even with only 6 terminals. 

These results may seem to suggest that PEU and PVU have the same ap- 
proximation guarantees. But the next theorem shows that PVU is significantly 
harder than PEU. We show this by a reduction from the set-cover packing prob- 
lem (or domatic number problem). Given a bipartite graph G(Ui U V 2 ,E), a 
set-cover (of V 2 ) is a subset S' C such that every vertex of V 2 has a neighbor 
in S. A set-cover packing is a collection of pairwise disjoint set-covers of ¥ 2 - The 
goal is to find a packing of set-covers of maximum size. Feige et al. [6] show that, 
unless NP C DTIME(n^°8*°s"), there is no (1 — e) Inn approximation algorithm 
for set-cover packing, where n = |Ui| -I- |V 2 |- We have the following theorem. 

Theorem 12. PVU cannot he approximated within ratio (1 — e)logn, for any 
e > 0, unless NP C UT/ME(n^°s'°g”). 

On the other hand, we can obtain an 0{y/rilog n) algorithm for PVCU (which 
contains PVU as a special case). To do so, consider the fractional version of 
PVCU obtained by relaxing the integrality condition in the IP formulation. The 
separation problem for the dual of this LP is the minimum node- weighted Steiner 
tree problem. For this problem, Guha and Khuller [12] give an O(logn) approx- 
imation algorithm. Using the following analog of Theorem 6 (or Theorem 4.1 in 
[14]) we obtain a polytime O(logn) approximation for fractional PVCU. 
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Lemma 3. There is an a- approximation for fractional PVCU if and only if 
there is an a -approximation for the minimum node weighted Steiner tree problem. 

Remark: Lemma 3 and the fact that the minimum node weighted Steiner tree 
problem is hard to approximate within O(logfc) (with k being the number of 
terminals) yields an alternative proof for the l7(log k) hardness of PVCU. 

The algorithm for PVCU is similar to the ones we presented for PECD and 
PVCD. That is, we apply randomized rounding to the solution of the fractional 
PVCU instance. Skipping the details, this yields the following: 

Theorem 13. Given an instance of PVCU and any e > 0, we can find in poly- 
nomial time a set of Steiner trees (satisfying the vertex capacity constraints) of 
size at least G{max{ipf /^/rilogn, imn{ip‘j/nlog^ n, <^//logn}}), where ipj is 
the optimal value of the instance of fractional PVCU. 

3.1 Packing Vertex-Disjoint Priority Steiner Trees 

The priority Steiner problem has been studied by Charikar et al. [4]. Here, we 
study the problem of packing vertex-disjoint priority Steiner trees of undirected 
graphs. (One difference with the earlier work in [4] is that weights and priorities 
are associated with vertices rather than with edges). Consider an undirected 
graph G = (V, E) with a set of terminals T C V, one of which is distinguished as 
the root r, every vertex v has a nonnegative integer py as its priority, and every 
Steiner vertex v € V — T has a positive capacity Cy. A priority Steiner tree is 
a Steiner tree such that for each terminal t G T every Steiner vertex v on the 
r,t path has priority py > pt. In the problem PVCU-priority (Packing Vertex 
Capacitated Undirected Priority Steiner Trees) the goal is to find a maximum 
set of priority Steiner trees obeying vertex capacities (i.e., for each Steiner vertex 
V G V — T the number of trees containing r; is < Cy). The algorithm we presented 
for PVCU extends to PVCU-priority, giving roughly the same approximation 
guarantee. 

Theorem 14. Given an instance of PVCU-priority and any e > 0, we can find 
in polynomial time a set of priority Steiner trees (satisfying the vertex capacity 
constraints) of size at least f2{max{ipf , min{(/3^/n^+'^, <^j/n'^/^}}), where 
iff is the optimal value of the instance of fractional PVCU-priority. 

On the other hand, we prove an hardness result for PVCU-priority 

by adapting the proof of Theorem 4 (thus improving on our logarithmic hardness 
result for PVCU). The main difference from the proof of Theorem 4 is that we 
use instances of the Undir-Node-USF problem ( Undirected Node capacitated Un- 
splittable Flow) - which is shown to be NP-complete in [9] - instead of instances 
of 2DIRPATH as the modules that are placed on the “grey boxes” in Figure 1. 

Theorem 15. Given an instance of PVCU-priority, it is NP-hard to approxi- 
mate the solution within for any e > 0. 
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Abstract. We study approximation hardness of the Minimum Domi- 
nating Set problem and its variants in undirected and directed graphs. 
We state the first explicit approximation lower bounds for various kinds 
of domination problems (connected, total, independent) in bounded de- 
gree graphs. For most of dominating set problems we prove asymptoti- 
cally almost tight lower bounds. The results are applied to improve the 
lower bounds for other related problems such as the Maximum Induced 
Matching problem and the Maximum Leaf Spanning Tree problem. 



1 Introduction 

A dominating set in a graph is a set of vertices such that every vertex in the 
graph is either in the set or adjacent to a vertex in it. The Minimum Dominat- 
ing Set problem (Min-DS) asks for a dominating set of minimum size. The 
variants of dominating set problems seek for a minimum dominating set with 
some additional properties, e.g., to be independent, or to induce a connected 
graph. These problems arise in a number of distributed network applications, 
where the problem is to locate the smallest number of centers in networks such 
that every vertex is nearby at least one center. 

Definitions. A dominating set D in a graph G is an independent dominating set 
if the subgraph Go of G induced by D has no edges; D is a total dominating set 
if Gd has no isolated vertices; and I? is a connected dominating set if Gd is a 
connected graph. The corresponding problems Minimum Independent Dom- 
inating Set (Min-IDS), Minimum Total Dominating Set (Min-TDS), 
and Minimum Connected Dominating Set (Min-CDS) ask for an indepen- 
dent, total, and connected dominating set of minimum size, respectively. We use 
acronym B for the problem in B-bounded graphs (graphs with maximum degree 
at most B). Let ds{G) stand for the minimum cardinality of a dominating set in 
G. Similarly, let ids(G), tds{G), and cds(G), stand for the corresponding minima 

* Partially supported by the Science and Technology Assistance Agency grant APVT- 
20-018902. 
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for Min-IDS, Min-TDS, and Min-CDS for G, respectively. For definiteness, the 
corresponding optimal value is set to infinity if no feasible solution exists for G. 
That means, tds{G) < oo iff G has no isolated vertices, and cds{G) < oo iff 
G is connected. It is easy to see that ds{G) < ids{G), ds{G) < tds{G), and 
ds{G) < cds{G). Moreover, tds{G) < cds{G) unless ds{G) = 1. 

In fact, dominating set problems are closely tied to the well-known Minimum 
Set Cover problem (Min- SC). Let a set system Q = (U,S) be given, where U 
is a universe and 5 is a collection of (nonempty) subsets of Li such that U5 = U. 
Any subcollection S' Q S such that U5' = Li is termed a set cover. Min-SC asks 
for a set cover of minimum cardinality, whose size is denoted by sc(Q). 

An instance G = (Li,S) of Min-SC can be viewed as a hypergraph G with 
vertices Li and hyperedges S. For an element x € Li let deg(a:) denote the number 
of sets in S containing x, deg(^) := maxj,^;./ deg(a;), and let A{G) denote the 
size of the largest set in S. We will also consider the restriction of Min-SC to 
instances G with bounded both parameters A{G) < k and deg(^) < d denoted 
by (fc, d)-MiN-SC. Hence, (fc, oo)-MiN-SC corresponds to fc-MiN-SC, in which 
instances G are restricted to those with A{G) < k. 

For a hypergraph G = (Li,S) define its dual hypergraph G = {S,Uq) such that 
vertices in G are hyperedges of G and hyperedges Uq = {xq : x € Li} correspond 
to vertices of G in the following sense: given x € Li, xg contains all S' g 5 
such that a: g S in C/. In the context of hypergraphs, Min-SC is the Minimum 
Vertex Cover problem (Min-VC) for the dual hypergraph. Clearly, deg and 
A are dual notions in the hypergraph duality. In fact, (k, c?)-Min-SC is the same 
problem as (d, fc)-MiN-VC, but in its dual formulation. 

Min-SC can be approximated by a natural greedy algorithm that iteratively 
adds a set that covers the highest number of yet uncovered elements. It provides 
an H/i-approximation, where 7-ii := j is the i-th harmonic number. This 

factor has been improved to Ha — 5 [7]- Additionally, Feige [8] has shown that 
the approximation ratio of Inn for Min-SC is the best possible (as a function of 
n := \Li\, up to a lower order additive term) unless NP C DTIME(n‘^*^'°s*°s"ii). 

For any NP-hard optimization problem H one can define its approximation 
thresholds tp and ^np as follows tp = infjc > 1 : there is a polynomial c- 
approximation algorithm for H} and tNP = supjc > 1 : achieving approximation 
ratio c for H is NP-hard}. For definiteness, inf 0 := 00 . Hence tp < 00 iff H is in 
APX. Further, tp = 1 iff H has PTAS. Clearly ^np < tp unless P = NP. 

Our contributions. Due to a close relation of Min-DS to Min-SC almost tight 
approximability results are known for Min-DS in general graphs. The best upper 
bound, which is logarithmic in maximum degree of the graph, almost matches 
the lower bound. In this contribution we investigate the approximability of the 
dominating set problem and its several variants in subclasses of general graphs, 
bounded degree, and directed graphs. 

In Section 2 we observe that similar approximation hardness results as for 
Min-DS in general graphs hold for Min-DS, Min-TDS, and Min-CDS in split 
or, alternatively, in bipartite graphs. For H-bounded graphs we prove in Section 3 
asymptotically tight lower bounds of Ini? for Min-DS, Min-TDS, and Min- 
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CDS even in bipartite graphs. We present the lower bound for i3-MiN-IDS 
that increases linearly with B, similar to the upper bound. All lower bounds 
summarized in the following table are new contributions of this paper and hold 
even in bipartite graphs, upper bounds are due to [1], [7], [10]. 





B-Min-DS 


B-Min-CDS 


B-Min-TDS 


B-Min-IDS 


Low. 


In i? — Cln In i? 


In i? — C In In 


\nB — C\tl\tlB 


SB 


Upp. 


'Hb+i — 1 


■Hb+2 


Ub-\ 


c> B — 1 

^ B'^ + l 



In Section 4 we introduce various kinds of reductions to achieve lower bounds 
for H-Min-DS and S-Min-IDS problems for small values of B. These lower 
bounds are summarized in the table below (* means that lower bound is achieved 
even in bipartite graphs), and upper bounds follow from [1], [7]. 



Problem 


3-Min-DS 


4-Min-DS 


3-Min-IDS 


4-Min-IDS 


Lower bound 


391 * 

390 


100 

99 


681 

680 


294 * 

293 


Upper bound 


19 

12 


107 

60 


2 


65 

17 



Section 5 is devoted domination problems in directed graphs. We show that 
in directed graphs with indegree bounded by a constant B > 2 the directed 
version of Min-DS has simple {B + l)-approximation algorithm, but it is NP- 
hard to approximate within any constant smaller than B — 1 for B > 3 (1.36 
for i? = 2). In directed graphs with outdegree bounded by a constant B > 2 
we prove almost tight approximation lower bound of In B for directed version of 
Min-DS. We also point out that the problem to decide of whether there exists 
a feasible solution for the Min-IDS problem in directed graphs is NP-complete 
(even for bounded degree graphs). 

In Section 6 we apply inapproximability results obtained for domination and 
covering problems to improve on approximation hardness results of two graph 
optimization problems. We improve the previous lower bound for 3-Max-IM 
(Maximum Induced Matching). Additionally, our lower bound for 7?-Max- 
IM in i?-regular graphs {B large) almost matches known linear upper bound 
(only APX-completeness was previously known with a lower bound very close to 
1, even for large B). We also establish the first explicit lower bound ||| for the 
Maximum Leaf Spanning Tree problem, and this hardness results applies 
also to bipartite graphs with all vertices but one of degree at most 5. 



2 Dominating Set Problems in General Graphs 

As it is already known, Min-DS in general graphs has the same approximation 
hardness as Min-SC. In this section we prove similar hardness results for other 
domination problems even in restricted graph classes. Let us start with the simple 
reduction between domination and set cover problems. 
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DS-SC reduction. Each vertex v € V of a graph G will correspond to an 
element of 14, hence 14 := V, and the collection S will consist of the sets iV„ U{?;} 
for each vertex v £ V (resp., only Ny for Min-TDS), where Ny is the set of all 
neighbors of v. This reduction exactly preserves feasibility of solutions: every 
dominating set in G (resp., total dominating set for a graph without isolated 
vertices) corresponds to a set cover in {U,S) of the same size, and vice versa. 

In this way we get polynomial time algorithm with the approximation ratio 
"H(deg(G)+i) ~ 5 for Min-DS and with the approximation ratio 'Hdeg(G) ~ | for 
Min-TDS using results for Min-SC [7], where deg(G) denotes the maximum de- 
gree of G. For the Min-CDS problem there is known polynomial time algorithm 
with the approximation ratio Ttdeg(G) + 2 [10]. 

Now we recall two reductions in the opposite direction that we use to obtain 
hardness results for dominating set problems. For each instance {14, S) of Min- 
SC we assign the {14, S) -bipartite graph with bipartition {14, S) and edges between 
S and all elements x £ S, for each S £ S. 

Split SC-DS reduction. Given an instance Q = {14, S) of MiN-SC, create 
first a (G, (S)-bipartite graph and then make a clique of all vertices of S. This 
reduction transforms any set cover in {14, S) to the corresponding dominating 
set of the same size in the resulting split graph G. It is not difficult to see 
that a dominating set of minimum size in G is achieved also among dominating 
sets which contains only vertices from S: any dominating set D in G can be 
efficiently transformed to the one, say D\ with \D'\ < \D\ and D' C S. Since a 
dominating set contained in S induces a clique, problems Min-CDS, Min-TDS, 
and Min-DS have the same complexity in graphs constructed using split SC-DS 
reduction. 

Bipartite SC-DS reduction. Given an instance Q = {14, S) of MiN-SC, create 
first a (77, iS) -bipartite graph. Then add two new vertices y and y', and connect 
y to each S £ S and to y' . For the resulting bipartite graph G one can now 
confine to dominating sets consisting of y and a subset of S corresponding to a 
set cover, hence we have ds{G) = cds{G) = tds{G) = sc{Q) -I- 1. 

In order to transfer Feige’s (1 — e) In \U\ hardness result [8] from Min-SC to 
dominating set problems using both SC-DS reductions, we need to know that 
“hard instances” of Min-SC satisfy ln(|77| -I- \S\) ~ ln(|77|). Analyzing Feige’s 
construction we turn out that this is indeed true. Hence known hardness results 
for Min-DS hold also for Min-TDS and Min-CDS even in split and bipartite 
graphs. 

Theorem 1. Min-DS, Min-TDS, and Min-CDS cannot be approximated to 
within a factor of (1 — er) Inn in polynomial time for any constant e > 0, unless 
NP C DTIME(n‘^*^^°8*°s"'^). The same results hold also in split graphs (and hence 
in chordal graphs, and in complements of chordal graphs as well), and in bipartite 
graphs. 

The Min-IDS problem is NP-hard, and in fact, no method to approximate 
ids within a factor better than trivial one 0{n) appears to be known. Halldorsson 
[11] proved that Min-IDS cannot be approximated in polynomial time within a 
factor of n^“® for any £ > 0, unless P = NP. 
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3 Dominating Set Problems in Bounded Degree Graphs 



Now we consider domination set problems in graphs of maximum degree B. 
Due to results in general graphs, we have tp(B-MiN-DS) < Hb+i — 
tp(i3-MiN-TDS) < T-Lb — and tp(i3-MiN-CDS) < T-Lb + 2. In what follows 
we prove asymptotically tight lower bounds of In B (up to lower order terms) for 
all three mentioned problems. 

B-Minimum Dominating Set 

Trevisan [16] in the analysis of Feige’s construction proved the following inap- 
proximability result for B-Min-SC. 

Theorem 2 (Trevisan). There are absolute constants C > 0 and Bq > 3 such 
that for every B > Bq it is NT -hard to approximate the i3-MiN-SC problem 
within a factor oflnB — Cln In B . 



In fact, in the proof of the corresponding NP-hard gap type result an instance 
T of SAT of size n is converted to an instance Q = {U,S) of B-Min-SC. There 
are parameters I and m, I is fixed to 0 (In Ini?) and m is fixed to poiy^og b ■ 
produced instances have the following properties: \U\ = mn* poly log B, |5| = 
n^polylogi?, A{Q) < B, and deg(f/) < poly log i?. Further, if T is satisfiable 
then sc(Q) < ajiSj (for some a easily computable from n and B), but if is not 
satisfiable, then sc{Q) > a|iS|(lni? — Clnlni?). 

We use Trevisan’s result to prove inapproximability results for i?-MiN-DS 
for large B. In such case the following reduction from {B — l,i?)-MiN-SC to 
i?-MiN-DS can be used as a gap preserving reduction. 

SC-DSi reduction. Let Q = {U,S) be an instance of {B — l,i?)-MiN-SC 



B 



new vertices 



and (Zi,iS)-bipartite corresponding graph. Add a set W of 
connected to the {lA, ^(-bipartite graph as follows: each vertex S' € 5 is connected 
to one vertex of W and allocate these edges to vertices of W such that degree of 
each vertex in W is also at most B. Let G denote the bipartite graph of degree 
at most B constructed in this way. It can be proved that the SC-DSi reduction 
has the properties: sc{Q) < ds{G) < sc{Q) + ^ . 

The SC-DSi reduction translates the NP-hard question for {B — 1, B)-Min- 
SC to decide of whether sc{G) < ajSj, or sc{G) > /3|S| (for some efficiently 
computable functions a, ff) to the NP-hard question of whether ds{G) < (a -I- 
g)|S|, or ds{G) > /3|S| (assuming P — a > j^). 

It is easy to check that the SC-DSi reduction from {B — I,polylogi?)-MiN- 
SC to i?-MiN-DS can decrease the approximation hardness factor of ln(i?— I) — 
C In ln(i? — I) from Theorem 2 only marginally (by an additive term of 
Hence an approximation threshold tNP for B-MiN-DS and B sufficiently large is 
again at least In B — C In In B, with slightly larger constant G than in Theorem 2. 
Hence we obtain the following 



Theorem 3. There are absolute constants G > 0 and Bq > 3 such that for 
every B > Bq it is NP-/iard to approximate H-Min-DS (even in bipartite graphs) 
within a factor oflnB — Cln In B . 
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i?-MiN Total Dominating Set and B-Min Connected Dominating Set 
We modify slightly the SC-DSi reduction for i?-MiN-TDS and _B-Min-CDS. 
SC-DS 2 reduction. For an instance Q = (U,S) of (i? — 1, B)-MiN-SC (with B 



sufficiently large) construct the {lA, 5)-bipartite graph and add a set W of 



| 5 | 

B-2 



new vertices. Connect them to vertices of S in the same way as in SC-DSi 
reduction and add a set W of additional vertices, \W'\ = |W|. The vertices of 
W and W are connected to a 2|VF|-cycle with vertices of W and W alternating 
in it. The result of this transformation will be a bipartite graph G of maximum 
degree at most B with the following properties sc{G) < tds{G) < cds{G) < 



sc{G) + 2 



l-si 

B-2 ■ 



Hence we obtain essentially the same asymptotical results as 



for H-Min-DS. 



Theorem 4. There are absolute constants G > 0 and Hg > 3 such that for every 
B > Bq it is NT-hard to approximate (even in bipartite graphs) H-MiN-TDS, 
resp. H-Min-CDS, within a factor oflnB — ClnlnS. 

H-Minimum Independent Dominating Set 

One can easily observe that for any (B+l)-claw free graph G we have |J| < B\D\ 
whenever I is an independent set and D is a dominating set in G. Consequently, 
any independent dominating set in (B + l)-claw free graph G approximates 
Min-IDS, Min-DS, and Max-IS (Maximum Independent Set) within B. It 
trivially applies to H-bounded graphs as well. For many problems significantly 
better approximation ratios are known for H-bounded graphs than for {B + 1)- 
claw free graphs. For the B-MiN-IDS problem (asymptotically) only slightly 
better upper bounds are known [1] , namely tp < B — for H > 4 and tp < 2 
for B = 3. (For H-regular instances, B>5,tp<B — 1— 

The question of whether there are polynomial time algorithms for H-Min- 
IDS with ratios o{B) when B approaches infinity is answered in the negative 
(unless P = NP) by the following theorem. Our proof extracts the core of ar- 
guments used in [11] and [13]. Starting from NP-hard gap result for bounded 
occurrence version of Max- 3 SAT, we obtain NP-hard gap type result for B- 
Min-IDS, for large B. 

Theorem 5. There are absolute constants d > 0 and Bq such that for every 
B > Bq the H-Min-IDS problem is NT-hard to approximate within SB. The 
same hardness result applies to bipartite graphs as well. 



4 Dominating Set Problems in Small Degree Graphs 

Graphs of degree at most 2 have simple structure and all problems studied above 
can be solved efficiently in this class. Thus suppose that B >3. 

H-Minimum Dominating Set 

Using the standard DS-SC reduction and results for {B-\- 1)-Min-SC [7] we ob- 
tain upper bounds tp(3-MiN-DS) < tp(4-MiN-DS) < tp(5-MiN-DS) < 
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and tp(6-MiN-DS) < |||, respectively. To obtain a lower bound on ap- 
proximability for S-Min-DS from bounded version of Min-SC we cannot rely 
on the split and bipartite SC-DS reductions, as only finitely many instances of 
Min-SC will transform to instances of B-Min-DS by those reductions. Instead 
of that we can use the following simple VC-DS reduction / from MiN-VC to 
Min-DS to obtain a lower bound on ^np for S-MiN-DS for small values of B. 




VC-DS reduction. Given a graph G = (V,E) with n vertices and m edges 
(without isolated vertices), replace each edge e = {u,v} G if by a gad- 
get Ge (Fig. 1). The constructed graph /(G) has n + 4m vertices and 6m 
edges. Moreover, /(G) is bipartite, and if G is of degree B (> 3) then the 
same is true for /(G). Furthermore, the VC-DS reduction has the property 
ds(/(G)) = vc{G) -km, where vc{G) denote the minimum cardinality of a vertex 
cover in G. 

Applying the VC-DS reduction to a 3-regular graph G with n vertices pro- 
duces a bipartite graph /(G) of maximum degree 3 with 7n vertices and 9n edges. 
Using NP-hard gap result for 3-MiN-VC [3] we obtain that it is NP-hard to de- 
cide of whether ds(/(G)) is greater than 2.01549586n, or less than 2.0103305n, 
hence to approximate Min-DS within is NP-hard. 

Theorem 6. It is 'HF-hard to approximate (even in bipartite graphs) 3-Min-DS 
within 1 -k ^ . 

For larger value of B, i? > 4, better inapproximability results can be achieved 
by the following SC-DS3 reduction. 

SC-DS3 reduction. From an instance Q = (U,S) of MiN-SC we construct 
firstly the {lA, iS)-bipartite graph. Then for each S' G 5 we pick one fixed repre- 
sentative us G S and add new edges to the (U, S)-bipartite graph connecting S 
with each other S' G S containing us- (We avoid creating multiple edges). Let 
G denote the resulting graph. It can be proved that the SC-DS3 reduction has 
the property ds{G) = sc(Q). 

In what follows we prove that the SC-DS3 reduction can be modified to 
the one from (B — 1)-Min-VC (with a perfect matching) to S-Min-DS, that 
preserves the optimum of the objective function. Let H = (V,E) be an instance 
of {B — 1)-Min-VC with a fixed perfect matching M in it. Let Q = {E, Vh) 
be the dual hypergraph to (hyper)graph H. Due to duality, Q can be viewed as 
{B — 1, 2)-instance of MiN-SC and sc{Q) = vc{H). The corresponding (E,Vh)~ 
bipartite graph for Q is just division of H (for every edge e put a single vertex 
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on it, labeled by e), if one identifies each v € V with the corresponding set vr 
containing all edges incident with v in H . Now we consider SC-DS3 reduction 
and for each set S (corresponding to u S F) we take as us exactly that edge 
adjacent to u in 77 that belongs to M. Hence the resulting graph G can be 
obtained from a division of H by adding edges of M. Therefore, G is of degree 
at most B and ds{G) = sc{G) = vc{H). 

It is easy to verify that NP-hard gap results for i?-MiN-VC [3] {B = 3, 4, 
and 5) apply to 77-regular graphs with a perfect matching as well. (For 77 = 3, 
4 it is proved in [3] that hard instances produced there are 77-regular and edge 
77-colorable.) It means for 77 = 4, 5, and 6 we obtain for 77 -Min-DS the same 
lower bound as for (77 — 1 )-Min-VC. 

Theorem 7. It is NP-Ziard to approximate 4-Min-DS within 1+ 5-MiN-DS 
within 1 -|- and 6-MlN-DS within 1 

Remark 1. From results [2] for Minimum Edge Dominating Set it also follows 
that for 4-regular graphs obtained as line graphs of 3-regular graphs it is NP- 
hard to approximate MiN-DS within 1-|- Recall that for Min-DS restricted 
to line graphs there is a simple 2-approximation algorithm, but it is NP-hard to 
approximate within a constant smaller than | by results of [2]. 

77-Minimum Independent Dominating Set 

As it is proved in [I] ip(3-MiN-IDS) < 2 and the upper bound 

tp(77-MiN-IDS) < 77 — holds for any small value 77 > 4. It means 

tp(4-MiN-IDS) < ff, tp(5-MiN-IDS) < f|, and tp(6-MiN-IDS) < To 
obtain inapproximability results for 77 -Min-IDS for small values of 77 > 3, we 
use the following polynomial time reduction from Min-SC. 

SC-IDS reduction. Let an instance G = (77, S) of (77—1, 77)-MiN-SC be given. 
Start with the corresponding (77, 5)-bipartite graph and for each S € S add two 
new vertices S', S" and two edges {S', «S"}, |S',S"}. The resulting graph G is 
bipartite of maximum degree at most 77 and ids{G) = ds{G) = sc{G) + |S|. 

Using the previous we can obtain NP-hard gap result for 77-MiN-IDS (and 
77 -Min-DS as well) from the one for (77 — 1, 77 )-Min-SC, or equivalently, for the 
hypergraph (77, 77 — 1 )-Min-VC problem whose special case is the (77 — 1 )-Min- 
VC problem in graphs. More precisely, we can translate NP-hard gap results of 
[3] for (77 — 1 )-Min-VC to the ones for 77 -Min-IDS as follows: Starting from 
a 3-regular instance for 3 -Min-VC with n vertices using SC-IDS reduction we 
obtain a bipartite graph G of degree at most 4 and with the NP-hard question 
of whether ids{G) is greater than 1.51549586n or less than 1.5103305n. Hence, 
it is NP-hard to approximate 4-MiN-IDS even in bipartite graphs within |||. In 
the same way we can obtain inapproximability results for 5(6)-MiN-IDS starting 
from 4 and 5-regular graphs, respectively. 

Theorem 8 . It is SIP -hard to approximate (even in bipartite graphs) 4-MlN- 
IDS within 1 -I- 7 )^, 5-MlN-IDS within 1 -I- 6-MlN-IDS within 1 -|- 

zyo ’ 151 ’ 145 

To obtain a lower bound for 3 -Min-IDS, let us consider the following reduc- 
tion h from Min- VC to Min-IDS: 
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VC-IDS reduction. Given a graph G = {V,E) with n vertices and m edges 
(without isolated vertices), replace each edge e = {u,v} € E hy & simple edge 
gadget Ge (Fig. 2). The graph h{G) constructed in this way has n + 6m vertices 
and 8m edges. Moreover, if G is of maximum degree B (> 3) then the same is 
true for h{G) and ids{h{G)) = vc{G) + 2m. 

Applying the VC-IDS reduction to a instance G of 3 -Min-VC (with n ver- 
tices) and using NP-hard gap result for it [2], we obtain that it is NP-hard to 
decide of whether ids{h{G)) is greater than 3.51549586n, or less than 3.5103305n. 
Hence to approximate 3-MiN-IDS within ||i is NP-hard. 

Theorem 9. It is NP-hard to approximate 3-Min-IDS within 1 -I- g|g. 



5 Minimum Dominating Set in Directed Graphs 

In a directed graph G = (V, i^) a set Z) C V is a dominating set if for each 
V G V\D there is u G D such that uu G 1 ^ . Again, it is special case of Min-SC 
due to the following directed DS-SC reduction: 

Directed DS-SC reduction. For a directed graph G = we define 

an instance {U,S) of Min-SC as G := V and S := {V+ : v € V}, where 
N+ = {r;} U {u G V : vu G T^}. For such instance (G,S), set covers are in 
one-to-one correspondence with dominating sets in G. 

Minimum Dominating Set in Graphs with Bounded Indegree 
Due to the directed DS-SC reduction, instances of Min-DS with indegree 
bounded by a constant B can be viewed as instances Q of MiN-SC with 
deg(5) < H -I- 1. Hence the problem has a simple {B l)-approximation al- 
gorithm in this case. Furthermore, case 5=1 can be easily solved exactly. 

Asymptotically, we can obtain almost matching lower bound by the following 
reduction from restricted instances of MiN-SC to directed instances of MiN-DS: 
for an instance Q = (ZI, S) with deg(t/) < B construct a graph G with the vertex 
set V = W U iS U {S'o}, where is a new vertex. Add edges So^ in G for 
each S G S, and an edge Sx for each S G S and each x G S. The directed 
graph G = created in this way has indegree bounded by B. Obviously, 

there are minimum dominating sets in G consisting of Sq and C Q S, where C 
is a minimum set cover in (G,S). Hence this reduction preserves NP-hard gap 
results for MiN-SC restricted to instances Q with deg(t/) < B. Recall that this 
is equivalent to the Min- VC problem in hypergraphs with hyperedges of size at 
most B. Recently Dinur et al. ([4]) gave nearly tight lower bound {B — 1) on 
approximability in hypergraphs with all hyperedges of size exactly B, B > 
For 5 = 2 the lower bound 1.36 follows from the one for MiN-VC in graphs [5]. 

Theorem 10. It is NP-hard to approximate the Min-DS problem in directed 
graphs with indegree bounded by a constant B within any constant smaller than 
5 — 1 for 5 > 3, and within 1.36 for 5 = 2. On the other hand, the problem has 
a simple (5 -|- 1)- approximation algorithm for B >2. 
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Minimum Dominating Set in Graphs with Bounded Outdegree 

Instances of Min-DS with outdegree bounded by a constant B can be viewed 
as instances of Min-SC with sets of size at most B + 1. Hence the problem is 
polynomially solvable for B = 1, and for B >2 polynomial time approximation 
algorithm with the ratio TLb+i — 5 < ln_B + 0(1) is known [7]. 

To obtain some lower bounds for MiN-DS in graphs with outdegree bounded 
by a constant B we consider case when both, outdegree and indegree are bounded 
by B. This case is at least as hard as non-directed H-MiN-DS: replace in in- 
stances of H-Min-DS every edge {u, r;} by two directed edges uv, and observe 
that this reduction preserves dominating sets. Hence directly from Theorem 3 
we can obtain the following theorem 

Theorem 11. There are absolute constants C > 0 and Bq > 3 such that for 
every B > Bq it is NP-/iard to approximate Min-DS in directed graphs with 
outdegree hounded by a constant B within In B — C Inin B . However, there exists 
{TLb +1 — ^)- approximation algorithm for the problem for any B, B >2. 

Other Dominating Set Problems in Directed Graphs 

The variants of MiN-DS, namely MiN-TDS, MiN-CDS and Min-IDS, can 
be formulated for directed graphs as well. Mainly for connected domination 
problems there are many interesting questions left open. MiN-IDS in directed 
graphs is very different from its undirected counterpart. The problem to decide 
of whether there exists a feasible solution in an input directed graph is NP- 
complete even in bounded degree graphs due to the following reduction from 
Max- 3SAT-5: given an instance (f>, create a graph with two vertices labeled 
by X and x, for every variable x, and three vertices labeled by c, c", and c", 
for every clause C. Edges are chosen so that^very pair x, x is a 2-cycle ^ 
every triple c, c', c" is a directed 3-cycle y and there is an edge Ic when- 
ever literal Hs in a clause C. One can easily check that has an independent 
dominating set iff <p is satisfiable. Moreover, has fulldegree bounded by 7. 

6 Application to Other Problems 

Maximum Induced Matching problem (Max-IM) 

A matching M in a graph G = (F, E) is induced if for each edge e = {u, ?;} € E, 
u,v G V{M) implies e € M. The objective of Max-IM is to find a maximum 
induced matching in G, let im(G) denote its cardinality. 

The problem is known to be NP-complete even in bipartite graphs of max- 
imum degree 3 and the current state of the art can be found in [6], [14], and 
[17]. For H-Max-IM, B > 3, any inclusionwise maximal induced matching ap- 
proximates the optimal solution within 2{B — 1) (see [17]). This was improved 
to an asymptotic ratio H — 1 in H-regular graphs in [6] , where also the proof of 
APX-completeness of H-Max-IM in H-regular graphs is given. 

In what follows we present a lower bound for H-Max-IM in H-regular graphs 
(for large B) that approaches infinity with B and almost matches linear upper 
bound. Firstly, one can easily check that the lower bound of given by 
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Trevisan [16] for B-Max-IS applies to i?-regular graphs as well. Now consider 
the following transformation g for a, {B — l)-regular graph G = {V,E): take 
another copy G' = {V ,E') of the same graph G (with v' S V corresponding to 
V € V), and make every pair {n,n'} adjacent. The resulting graph is B-regular 
and it is easy to observe that is{G) < im{g{G)) < 2is{G). Hence a lower bound 
on approximability for B-Max-IM in H-regular graphs is at least ^ of Trevisan’s 
one, it means again of the form . 

For any H > 4 we can use the following simple reduction / from (H— 1)-Max- 
IS to H-Max-IM: /(G) is constructed from a graph G adding a pending {v,v'} 
at each vertex v of G. Obviously, im{f{G)) = is{G) and hence NP-hard gap re- 
sults for (H — 1)-Max-IS directly translates to the one for H-Max-IM. In partic- 
ular, tNp(4-MAX-IM) > If, tNp(5-MAX-IM) > |f, and tNp(6-MAX-IM) > ||. 

The problem to obtain any decent lower bound for 3-Max-IM is more diffi- 
cult. One can observe (e.g., [14]) that for any graph G = (V,E) its subdivision 
G^ (replace every edge {m, r;} in G with a path u, w, v through a new vertex w) 
satisfies zm(G°) = \V\ — ds{G). Using NP-hard gap result for 3-Min-DS from 
Theorem 6, we obtain instances G° of maximum degree 3 with 16n vertices, 18n 
edges, and with the NP-hard question to decide of whether im{G^) is greater 
than 4.9896695n, or less than 4.9845042n. Hence to approximate 3-Max-IM even 
in subdivision (and, in particular, bipartite) graphs within ||| is NP-hard. It 
improves the previous lower bound |||| for the 3-Max-IM problem in bipartite 
graphs from [6]. Using the reduction from Max-IS to Max-IM presented in [6], 
we can improve also a lower bound f|| for 3-Max-IM in general graphs. From a 
3-regular instance G of 3-Max-IS with n vertices, in the combination with NP- 
hard gap results for them ([3]), we produce an instance G' of 3-Max-IM (with 
5n vertices, edges, and with im{G') = n + is{G)) with the NP-hard question 
to decide of whether im is greater than 1.51549586n or less than 1.5103305n. 

Theorem 12. It is NP-hard to approximate 3-Max-IM within 1 -I- and 
within 1 -|- ^ in graphs that are additionally bipartite. Further, it is NP-hard to 
approximate 4-Max-IM within l-|-|j, 5 -Max-IM within and 6-Max-IM 

within 1 -|- ^. Asymptotically, it is NP-hard to approximate H-Max-IM within 
a factor (in B -regular graphs as well). 

Maximum Leaf Spanning Tree problem (Max-LST) 

The goal of Max-LST is for a given connected graph to find a spanning tree 
with the maximum number of leaves. The problem is approximable within 3 [15] 
and known to be APX-complete [9]. 

If G = {V,E) is a connected graph with |U| > 3 then |U| — cds{G) is the 
maximum number of leaves in a spanning tree of G. This simple observation 
allows us to obtain the first explicit inapproximability results for Max-LST. 
The NP-hard gap result for 4 -Min-VC in 4-regular graphs [3] implies the same 
NP-hard gap for the (4, 2 )-Min-SC problem due to the duality of both problems. 
Hence it is NP-hard to decide if the optimum for (4,2)-MiN-SC is greater than 
0.5303643725n or smaller than 0.520242915n, where n is the number of vertices 
for dual 4-regular graph. Applying the bipartite SC-DS reduction for such hard 
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instances of (4, 2 )-Min-SC we obtain a bipartite graph with 3n + 2 vertices, all 
but one of degree at most 5, and with the NP-hard question for Max-LST to 
decide of whether the optimum is less than 2.469635627n + 1, or greater than 
2.479757085n + 1. Hence inapproximability within ||| follows. 

Theorem 13. It is NP-hard to approximate (even in bipartite graphs with all 
vertices but one of degree at most 5) Max-LST within 1 + 
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Abstract. The following buffer management problem is studied: pack- 
ets with specified weights and deadlines arrive at a network switch and 
need to be forwarded so that the total weight of forwarded packets is 
maximized. A packet not forwarded before its deadline brings no profit. 

The main result is an online || ~ 1.939-competitive algorithm, the first 
deterministic algorithm for this problem with competitive ratio below 
2. In the s- uniform case, where for all packets the deadline equals the 
arrival time plus s, we give an 5 — vTO « 1.838-competitive algorithm. 

This algorithm achieves the same ratio in a more general scenario when 
all packets are similarly ordered. For the 2-uniform case we give an al- 
gorithm with ratio « 1.377 and a matching lower bound. 

1 Introduction 

One of the issues arising in IP-based QoS networks is how to manage the packet 
flows at a router level. In particular, in case of overloading, when the total 
incoming traffic exceeds the buffer size, the buffer management policy needs 
to determine which packets should be dropped by the router. Kesselman et 
al. [5] postulate that the packet drop policies can be modeled as combinatorial 
optimization problems. Our model is called buffer management with bounded 
delay in [5], and is defined as follows: Packets arrive at a network switch. Each 
packet is characterized by a positive weight and a deadline before which it must 
be transmitted. Packets can only be transmitted at integer time steps. If the 
deadline of a packet is reached while it is still being buffered, the packet is lost. 
The goal is to maximize the weighted number of forwarded packets. 

This buffer management problem is equivalent to the online version of the 
following single-machine unit-job scheduling problem. We are given a set of unit- 
length jobs, with each job j specified by a triple (rj,dj,Wj), where rj and dj are 
integral release times and deadlines, and wj is a non-negative real weight. One 
job can be processed at each integer time. We use the term weighted throughput 
or gain for the total weight of the jobs completed by their deadline. The goal is 
to compute a schedule that maximizes the weighted throughput. 

In the online version of the problem jobs arrive over time and the algorithm 
needs to schedule one of the pending jobs without knowledge of the future. An 
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online algorithm A is R-competitive if its gain on any instance is at least l/R 
times the optimal gain on this instance. The competitive ratio of A is the infimum 
of such values R. It is standard to view the online problem as a game between 
an online algorithm A and an adversary, who issues the jobs and schedules them 
in order to maximize the ratio between his gain and the gain of A. 

Past work. A simple greedy algorithm that always schedules the heaviest avail- 
able job is 2-competitive, and this is the best previous bound for deterministic 
algorithms for this problem. A lower bound of ~ 1.618 was shown in [1,3,4]- 

Some restrictions on instances of the problem have been studied in the lit- 
erature [5,1,3]. Let the span of a job be the difference between its deadline and 
the release time. In s-uniform instances the span of each job is equal exactly 
s. In s-bounded instances, the span of each job is at most s. The lower bound 
of (/) ~ 1.618 in [1,3,4] applies even to 2-bounded instances. A matching upper 
bound for the 2-bounded case was presented in [5]. Algorithms for 2-uniform 
instances were studied by Andelman et al. [1], who established a lower bound 
of |(\/3 -I- 1) ~ 1.366 and an upper bound of \/2 r; 1.414. This upper bound 
is tight for memory less algorithms [2], that is, algorithms which base their de- 
cisions only on the weights of pending jobs and are invariant under scaling of 
weights. The first deterministic algorithms with competitive ratio lower than 2 
for the s-bounded instances appear in [2]. 

Kesselman et al. [5,6] consider a different model related to the s-uniform case: 
packets do not have individual deadlines, instead they are stored in a buffer of 
capacity s and they are required to be served in FIFO order (i.e., if a packet is 
served, all packets in the buffer that arrived before the served one are dropped) . 
Any algorithm in this FIFO model applies also to s-bounded model and has 
the same competitive ratio [6]. Recently, Bansal et al. [7] gave a deterministic 
1.75-competitive algorithm in the FIFO model; this implies a 1.75-competitive 
algorithm for the s-uniform case. 

A randomized 1.582-competitive algorithm for the general case was given 
in [2]. For 2-bounded instances, there is a 1.25-competitive algorithm [2] that 
matches the lower bound [3]. For the 2-uniform case the currently best lower 
bound for randomized algorithms is 1.172 [2]. 

Our results. Our main result is a deterministic 64/33 ~ 1.939-competitive 
algorithm for the general case. This is the first deterministic algorithm for the 
problem with competitive ratio strictly below 2. All of the algorithms given 
previously in [5, 1,2,6] achieved competitive ratio strictly below 2 only when 
additional restrictions on instances were placed. 

For the s-uniform case, we give an algorithm with ratio 5 — \/Ib ~ 1.838. 
In fact, this result holds in a more general scenario when all jobs are similarly 
ordered, that is, when < rj implies di < dj for all jobs i,j. Note that this 
includes s-uniform and 2-bounded instances, but not s-bounded ones for s > 3. 

Finally, we completely solve the 2-uniform case: we give an algorithm with 
competitive ratio ~ 1.377 and a matching lower bound. Note that this ratio is 
strictly in-between the previous lower and upper bounds from [1]. 
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2 Terminology and Notation 

A schedule S specifies which jobs are executed, and for each executed job j it 
specifies an integral time t, Vj < t < dj, when it is scheduled. Only one job 
can be scheduled at any t. The throughput or gain of a schedule S is the total 
weight of the jobs executed in S. If A is a scheduling algorithm, by gain^{I) we 
denote the gain of the schedule computed by A on I. The optimal gain on / is 
denoted by opt{I). A job i is pending in S at time t ii ri <t < di and i has not 
been scheduled in S before t. (Thus all jobs released at time t are considered 
pending.) An instance is s-bounded if dj — Uj < s for all jobs j. Similarly, an 
instance is s -uniform if dj — rj = s for all j. An instance is similarly ordered if 
ri < rj implies di < dj for any two jobs i and j. 

Given two jobs i,j, we say that i dominates j if either (i) di < dj, or (ii) 
di = dj and Wi > Wj, or (hi) di = dj, Wi = Wj and i < j. (Condition (iii) 
only ensures that ties are broken in some arbitrary but consistent way.) Given 
a non-empty set of jobs J, the dominant job in J is the one that dominates all 
other jobs in J; it is always uniquely defined as ‘dominates’ is a linear order. 

A schedule S is called canonical earliest- deadline if for any jobs i and j in 
S, where i is scheduled at time t and j scheduled later, either j is released 
strictly after time t, or i dominates j. In other words, at any time, the job to be 
scheduled dominates all pending jobs that appear later in S. Any schedule can 
be easily converted into a canonical earliest-deadline schedule by rearranging its 
jobs. Thus we may assume that offline schedules are canonical earliest-deadline. 



3 A 64/33-Competitive Algorithm 

We start with some intuitions that should be helpful in understanding the algo- 
rithm and its analysis. The greedy algorithm that always executes the heaviest 
job (H-job) is not better than 2-competitive. An alternative idea is to execute 
the earliest deadline job at each step. This algorithm is not competitive at all, 
as it could execute many small weight jobs even if there are heavy jobs pending 
with only slightly larger deadlines. A natural refinement of this approach is to 
focus on sufficiently heavy jobs, of weight at least a times the maximal weight, 
and chose the earliest deadline job among those (an E-job). As it turns out, this 
algorithm is also not better than 2-competitive. 

The general idea of our new algorithm is to alternate H-jobs and E-jobs. 
Although this simple algorithm, as stated, still has ratio no better than 2, we 
can reduce the ratio by introducing some minor modifications. 

Algorithm GenFlag: We use parameters a = ^, /3 = ^, and a Boolean 
variable E, initially set to 0, that stores information about the previous step. At 
a given time step t, update the set of pending jobs (remove jobs with deadline t 
and add jobs released at t) . If there are no pending jobs, go to the next time step. 
Otherwise, let h be the heaviest pending job (breaking ties in favor of dominant 
jobs) and e the dominant job among the pending jobs with weight at least awh- 
Schedule either e or h according to the following procedure: 
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if E = 0 

then schedule e; if e ^ h then set 1 

else if de = t + 1 and We > (dwh then schedule e else schedule h] 
set Ei — 0 

A job e scheduled while f? = 0 is called an 0-job if e = /i, or an E-job 
otherwise. A job e scheduled while f? = 1 is an U-job. A job h scheduled in the 
last case is an El-job. The letters stand for Obvious, Early, Urgent, and Heaviest. 

Variable A is 1 iff the previous job was an E-job. Thus, in terms of the 
labels, the algorithm proceeds as follows: If an 0-job is available, we execute it. 
Otherwise, we execute an E-job, and in the next step either a U-job (if available) 
or an H-job. (There always is a pending job at this next step, thanks to the 
condition e =/= h: ii dh = t then e = h hy the definition of dominance; so if an 
E-job is executed at time t, dh > t and h is pending in the next step.) 

Theorem 3.1. GenFlag is a 6i/SS-competitive deterministic algorithm for 
unit-job scheduling. 

Proof. The proof is by a charging scheme. Fix an arbitrary (offline) schedule 
ADV. For each job j executed in ADV, we partition its weight Wj into one or 
two charges, and assign each charge to a job executed by GenFlag. If the total 
charge to each job i of GenFlag were at most Rwi, the i?-competitiveness of 
GenFlag would follow by summation over all jobs. Our charging scheme does 
not always meet this simple condition. Instead, we divide the jobs of GenFlag 
into disjoint groups, where each group is either a single 0-job, or an EH-pair (an 
E-job followed by an H-job), or an EU-pair (an E-job followed by a U-job). This 
is possible by the discussion of types of jobs before the theorem. For each group 
we prove that its charging ratio is at most R, where the charging ratio is defined 
as the total charge to this group divided by the total weight of the group. This 
implies that GenFlag is i?-competitive by summation over all groups. 

Charging scheme. Let j be the job executed at time t in ADV. Denote by 
i and h, respectively, the job executed by GenFlag and the heaviest pending 
job at time t. (If there are no pending jobs, introduce a “dummy” job of weight 
0. This does not change the algorithm.) Then j is charged to GenFlag’s jobs, 
according to the following rules. 

(EB) If j is executed by GenFlag before time t, then charge (I — (3)wh to i 
and the remaining Wj — {1 — (3)wh to j. 

(EF) Else, if j is executed by GenFlag after time t, then charge fiwj to i 
and (1 — P)wj to j. 

(NF) Else, if i is an H-job, Wj > j3wi, and ADV executes i after time t, then 
charge (3wj to i and (1 — j3)wj to the job executed by GenFlag at time 
t -|- 1. 

(U) Else, charge wj to i. (Note that this case includes the case i = j.) 

We label all charges as EB, EF, NF, U, according to which case above applies. 
We also distinguish upward, forward, and backward charges, defined in the ob- 
vious way. Thus, for example, in case (EB), the charge of wj — (1 — (3)wh to j 
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is a backward EB-charge. The letters in the labels refer to whether j was exe- 
cuted by GenFlag, and to the charge direction: Executed-Backward, Executed- 
Forward, Non-executed-Forward, Upward. We start with several simple obser- 
vations used later in the proof, sometimes without explicit reference. 

Observation 1: At time t, let h be the heaviest job of GenFlag and e the 
dominant job of weight at least awh- Let j be any pending job. Then 

(1.1) Wj < Wh- 

(1.2) If j dominates e then Wj < aWh- 

Proof: Inequality (1.1) is trivial, by the definition of h. Inequality (1.2) follows 
from the fact that e dominates all pending jobs with weight at least aWh- 

Observation 2: Suppose that at time t GenFlag schedules a job that receives 
a forward NF-charge from the job I scheduled at time t — 1 in ADV. Then I is 
pending at time t. 

Proof: Denote by / the heaviest pending job of GenFlag at time t — 1. By 
the definition of NF-charges, I ^ f, I is pending at time t — 1, f is executed 
as an H-job, df > f -I- 1, and wi > f)wf. Therefore di > t -I- 1, since otherwise 
GenFlag would execute I (or some other job) as a U-job at step t — 1. Thus I 
is also pending at step t, as claimed. 

Observation 3: Suppose that at time t GenFlag schedules a job e of type O 
or E, and h is the heaviest pending job at time t. Then 

(3.1) The upward charge to e is at most Wh- 

(3.2) If ADV executes e after time t then the upward charge is at most awh- 

(3.3) The forward NF-charge to e is at most (1 — (3)wh- 

Proof: Denote by j the job executed at time t in ADV. If j is scheduled by 
GenFlag before time t, then the upward charge is (1 — (3)wh < awt and claims 
(3.1) and (3.2) hold. Otherwise j is pending at time t. Then the upward charge 
is at most Wj < Wh and (3.1) follows. If ADV executes e after time t, then, since 
ADV is canonical, job j must dominate e, and (3.2) follows from (1.2). (3.3) 
follows by Observation 2 and (1.1). 

Observation 4: Suppose that at time t GenFlag executes an H-job h that is 
executed after time t in ADV. Then the upward charge to h is at most jSwh- 
Proof: Let j be the job executed at time t in ADV. In case (EB), the upward 
charge to h is at most {1 — f3)wh < (3wh- In all other cases, j is pending at time 
t, so Wj < Wh- In cases (EF) and (NF), the charge is (3wj < (dwh- In case (U) the 
charge is wj, but since (NF) did not apply, this case can occur only if wj < /3wh- 
Now we examine the charges to all job groups in GenFlag’s schedule. 
O-jobs. Let e = be an O-job executed at time t. The forward NF-charge is 
at most {1 — P)we- li e gets a backward EB-charge then e gets no forward EF- 
charge and the upward charge is at most aWe] the total charging ratio is at most 
2 + a — P < R. If e does not get a backward EB-charge then the forward EF- 
charge is at most (I — P)we and the upward charge is at most We', the charging 
ratio is at most 3 — 2/3 < i?. 

E-jobs. Before considering EH-pairs and EU-pairs, we estimate separately the 
charges to E-jobs. Suppose that at time t GenFlag executes an E-job e, and 
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the heaviest job is h. We claim that e is charged at most awh + (2 — P)we- We 
have two cases. 

Case 1 : e gets no backward EB-charge. Then, in the worst case, e gets an upward 
charge of Wh, a forward NF-charge of (1 — P)wh, and a forward EF-charge of 
{1 — P)wg. Using (2 — P)wh = 2awh and aWh < We, the total charge is at most 
(2 — (3)wfi + (1 — P)we < awh + (2 — /3)tCe as claimed. 

Case 2 : e gets a backward EB-charge. Then there is no forward EF-charge and 
the upward charge is at most awh by (3.2). Let I be the job scheduled at time 
t — 1 in ADV. If there is an NF-charge, it is generated by I and then I is pending 
for GenFlag at time t by Observation 2. 

If there is a forward NF-charge and di > dg, then I is pending at the time 
when ADV schedules e. (In this case I cannot be executed by GenFlag, because 
charges EB, EF did not apply to 1.) Consequently, e receives at most a backward 
EB-charge Wg — {1 — P)wi, a forward NF -charge {1 — f3)wi, and an upward charge 
awfi- The total is at most awh + Wg < awh + {2 — P)wg, as claimed. 

Otherwise, there is no forward NF-charge, or there is a forward NF-charge 
and di < dg. In the second case, I is pending at t, and I dominates e, so the 
forward NF-charge is at most (1 — (3)wi < (I — P)wg. With the backward EB- 
charge of Wg and upward charge of at most awh, the total is at most awh + (2 — 
P)wg, as claimed. 

EH-pairs. Let e be the E-job scheduled at time t, h the heaviest pending job 
at time t, and h' the H-job at time t -I- 1. By the algorithm, e ^ h. Note that, 
since GenFlag did not execute h as an 0-job at time t, h is still pending after 
the E-step and ww > Wh- 

We now estimate the charge to h' . There is no forward NF-charge, as the 
previous step is not an H-step. If there is a backward EB-charge, the additional 
upward charge is at most /3wh' by Observation 4 and the total is at most (1 -|- 
P)wh' ■ If there is no EB-charge, the sum of the upward charge and a forward 
EF-charge is at most ww + (1 — (d)wh' < (1 + l3)wh'- With the charge to e of at 
most awh + (2 — (3)wg, the total charge of the EH-pair is at most awh + (2 — 
(3)wg + {1 + P)wh', and thus the charging ratio is at most 

2 /3 I ~ <2 13 \ ~ = R 

Wg + Wh’ ~ OLWh + Wh' 

The first step follows from Wg > aWh- As 2/3 — 1 < 1, the next expression is 
decreasing in Wh', so the maximum is at ww = Wh, and it is equal to R by the 
definitions of a and /3. 

EU-pairs. As in the previous case, let e, and h denote the E-job scheduled 
at time t and the heaviest pending job at time t. By g and h' we denote the 
scheduled U-job and the heaviest pending job at time t -I- 1. As in the case of 
EH-pairs, e ^ h and Wf > Wh- 

Job g gets no backward EB-charge, since it expires, and no forward NF- 
charge, since the previous step is not an H-step. The upward charge is at most 
Wh', the forward EF-charge is at most (1 — P)wg. 
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With the charge to e of at most awh + {2 — P)we, the charge of the EU-pair 
is at most awh + (2 — /3)we + ww + (1 — so the charging ratio is at most 









vg awh + (iwh' a + (3 

In the first step, we apply bounds We > awh and Wg > (3wh'- As 1 — /? < /3, the 
next expression is decreasing in wy, so the maximum is at Wh' = Wh- □ 



4 Similarly Ordered Jobs 

We now consider the case when the jobs are similarly ordered, that is < rj 
implies di < dj for all jobs i,j. 

Algorithm SimFlag: We use one parameter a = \/Tb/5 ~ 0.633 and a Boolean 
variable E, initially set to 0. At a given time step t, update the set of pending 
jobs. If there are no pending jobs, go to the next time step. Otherwise, let h 
be the heaviest pending job (breaking ties in favor of dominant jobs) and e the 
dominant job among the pending jobs with weight at least awh- Schedule either 
e or h according to the following procedure: 

if E = 0 then schedule e; if e=^hAde>t+l then set 1 
else schedule h; set 

A job e scheduled while A = 0 is called an 0-job if e = h or dg = t + 1, and 
an E-job otherwise. A job h scheduled while A = 1 is called an El-job. The 
intuition behind the algorithm is similar to GenFlag, and, in fact, it is simpler. 
It schedules 0-jobs, if available, and if not, it schedules an E-job followed by an 
H-job. (By the condition e ^ h, there is a pending job in the next step.) 

Theorem 4.1. SimFlag is a — \/Tb Ri l.SiS- competitive deterministic algo- 
rithm for unit-job scheduling for similarly ordered instances. 

The proof is by a charging scheme, similarly as for GenFlag. The key ob- 
servation is that if after scheduling an A-job at time t, there is a pending job 
with deadline t -|- 2 or more, no job with deadline t -I- 1 can be released, and thus 
we do not need the special case of [/-jobs. This simplifies both the algorithm 
and analysis, and improves the competitive ratio. 

We describe the charging scheme below. The rest of the proof showing that 
the charging ratio to each group is at most R is similar as for GenFlag and is 
omitted in this extended abstract. 

Charging scheme. Let j be the job executed at time t in ADV. Denote by i 
and h, respectively, the job executed by SimFlag and the heaviest pending job 
at time t. (Without loss of generality, we can assume that such jobs exist.) Let 
/3 = 4 - \/l0 ft! 0.838. Then j is charged to SimFlag’s jobs, according to the 
following rules. 

(EB) If j is executed by SimFlag at or before time t, then charge Wj to j. 
(NF) Else, if dj > t -I- 1 and i is an H-job, then charge fdwj to i and (1 — /3)rCj 
to the job scheduled by SimFlag at time t -I- 1. 

(U) Else, charge Wj to i. 
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5 2-Uniform Instances 

Let Q ~ 1.377 be the largest root of Q^ + Q^ — 4Q + 1 = 0. First we prove a lower 
bound of Q on the competitive ratio and then we give a matching algorithm, 
both for the 2-imiform instances, i.e., each job j satisfies dj = rj + 2. 

At each step t, we distinguish old pending jobs, that is, those that were 
released at time t — 1 but not executed, from the newly released jobs. We can 
always ignore all the old pending jobs except for the heaviest one, as only one of 
the old pending jobs can be executed. To simplify notation, we identify jobs by 
their weight. Thus “job a;” means the job with weight x. Such a job is usually 
uniquely defined by the context, possibly after specifying if it is an old pending 
job or a newly released job. 



5.1 Lower Bound 



Fix some 0 < e < 2Q — 2. We define a sequence \Pi, i = 1,2,..., as follows. For 
i = 1, = Q — 1 — e. Inductively, for * > 1, let 



= (2 - -{Q- 1)^ 

'2-Q-Wr^ 



Lemma 5.1. For alii, we have \Fi\ < Q—1 ~ 0.377. Furthermore, the sequence 
{Fi\ converges to 1 — Q ~ —0.377. 

Proof. Substituting Zi = Fi + Q— 1, we get Zi+i = yye show inductively 

0 < Zi < 2Q — 2 — e. Initially, zi = 2Q — 2 — e. Inductively, if 0 < Zi < 2Q — 2 — e, 
then 0 < Zi+i < ^ Thus 0 < Zi < 2Q — 2 — e for all i, limi_>.oo Zi = 0, 

and the lemma follows. □ 



Theorem 5.2. There is no deterministic online algorithm for the 2-uniform 
case with competitive ratio smaller than Q. 



Proof. The proof is by contradiction. Suppose that A is a {Q — e)-competitive 
algorithm, for some e > 0. We develop an adversary strategy that forces A’s 
ratio to be bigger than Q — e. 

Let Fi be as defined before Lemma 5.1. For i > 1 define 



Qji 



Q-l 



and bi 



Q{2-Q- F,) 

{Q-iy 



By Lemma 5.1, for all i, bi > ai > 1 and, for large i, Oi ~ 3.653 and bi ps 9.688. 

Our strategy proceeds in iterations. It guarantees that at the beginning of 
iteration i, both A and the adversary have one or two old pending job(s) of 
the same weight Xi. Note that it is irrelevant if one or two old pending jobs are 
present, and also if they are the same for A and the adversary. 

Initially, we issue two jobs of some arbitrary weight a:i > 0 at time 0 and 
start iteration 1 at time 1. Both A and the adversary execute one job X\ and 
both have an old pending job with weight x\ at the beginning of iteration 1. 

In iteration i, A and the adversary start with an old pending job Xi. The 
adversary now follows this procedure: 
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issue one job aiXi 

(A) if A executes aiXi then execute Xi, GiXi and halt 
else {A executes Xi) 

at the next time step issue biXi 

(B) if A executes biXi then execute Xi, aiXi, biXi, and halt 
else [A executes aiXi) 

(C) at the next time step issue two jobs Xi+\ = biXi 
execute atXi, biXt, biXi 

If A executes first Xi and then then after step (C) it executes one job 
biXi, either the old pending one or one of the two newly released jobs. After this, 
both A and the adversary have one or two newly released jobs biXi pending, and 
the new iteration starts with Xi+i = biXi. 

A single complete iteration of the adversary strategy is illustrated in Figure 1. 




Fig. 1. The adversary strategy. We denote x = Xi, a = ai, and b = bi. Line segments 
represent jobs, dark rectangles slots when the job is executed by A, and lightly shaded 
rectangles executions by the adversary. 



If the game completes i — 1 iterations, then define gain^ and advi to be the 
gain of A and the adversary in these steps. We claim that for any i, either the 
adversary wins the game before iterations i, or else in iteration i we have 

{Q — e)gain^ — advi < 'I'iXi. (1) 

The proof is by induction. For i = 1, {Q — e)gairii — adv\ < (Q — e—l)xi = 'PiXi. 

Suppose that (1) holds after iteration f — 1, that is, when A and the adversary 
have an old pending job with weight Xi. If A executes QiXi in step (A), then, 
denoting by p the sequence of all released jobs (up to and including UiXi), using 
the inductive assumption, and substituting the formula for a^, we have 

(Q — e)gainj^(p) — adv{p) = {Q — e)gain^ — advi + {Q — e)aiXi — (xi + UiXi) 

< - I + {Q - e - l)ai]xi = -eaiXi < 0, 

contradicting the {Q — e)-competitiveness of A. 
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If A executes Xi and then biXi in (B), then, again, denoting by p the sequence 
of all released jobs, using the inductive assumption, and substituting the formulas 
for ai,bi, we obtain a contradiction with the (Q — e)-competitiveness of A. 

(Q — e)gainji^{p) — adv{p) 

= {Q - e)gairii - adv^ + {Q - e){xi + biXi) - {xi + + b^xt) 

<[Wi + Q-e-l + {Q-e-l)hi-ai]xi = -e{l + bi)xi < 0 . 

In the remaining possibility (C), A executes first Xi, then and then hiXi. 
Using the formulas for m, bi, 'f'i+i, and the defining equation Q^+Q'^—4Q+1 = 0, 
we have 

(Q - e)gairii^i - advr+i 

< {Q — e)gain^ — advi + {Q — e)(xi + a^Xi + biXi) — (aiXi + 2biXi) 

< i'l'i + Q + {Q - l)ai - {2 - Q)bi]xi = bpPi+iXi = Xi+i'l'i+i. 

This completes the inductive proof of (1). 

Lemma 5.1 and (1) imply that, for i large enough, we have {Q — e)gain^ — 
advi < ^iXi < (1 — (5 + e)xi. Denoting by g the sequence of all jobs (including 
the pending jobs Xi), we have 

(Q — e)gain^{g) — adv{g) = {Q — e)gairij^ — advi + {Q — e)xi — Xi 

< {1- Q + e)xi + {Q - e)xi - Xi = 0. 

This contradicts the (Q — e)-competitiveness of A. □ 

5.2 Upper Bound 

We now present our Q-competitive algorithm for 2-uniform case. Given that 
the 2-uniform case seems to be the most elementary case of unit job scheduling 
(without being trivial), our algorithm (and its analysis) is surprisingly difficult. 
Recall, however, that, as shown in [2], any algorithm for this case with com- 
petitive ratio below \/2 needs to use some information about the past. Further, 
when the adversary uses the strategy from Theorem 5.2, any Q-competitive algo- 
rithm needs to behave in an essentially unique way. Our algorithm was designed 
to match this optimal strategy, and then extended (by interpolation) to other 
adversarial strategies. Thus we suspect that the complexity of the algorithm is 
inherent in the problem and cannot be avoided. 

We start with some intuitions. Let A be our online algorithm. Suppose that 
at time t we have one old pending job z, and two new pending jobs b, c with 
6 > c. In some cases, the decision which job to execute is easy. If c > z, A can 
ignore z and execute b in the current step. If z > b, A can ignore c and execute z 
in the current step. If c < z < 6, .4 faces a dilemma: it needs to decide whether 
to execute z or b. For c = 0, the choice is based on the ratio zjb. If z/b exceeds a 
certain threshold (possibly dependent on the past), we execute z, otherwise we 
execute b. Taking those constraints into account, and interpolating for arbitrary 
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values of c, we can handle all cases by introducing a parameter 0 < 77 < 1, 
and making the decision according to the following procedure: 

Procedure SSP^: If z > r]b+ {1 — rj)c schedule z, otherwise schedule b. 

To derive an online algorithm, say A, we examine the adversary strategy 
in the lower bound proof. Consider the limit case, when z — >■ 00, and let a* = 
limi_>oo cii = QliQ — 1) and fo* = limi_>oo = QHQ — 1)^- Assume that in 
the previous step two jobs z were issued. If the adversary now issues a single 
job a, then A needs to do the following: if a < a*z, execute z, and if a > a^z, 
then execute a. Thus in this case we need to apply SSP„ with the threshold 
a = 1/a* = (Q - Ij/Q. 

Then, suppose that in the first step A executed z, so that in the next step a 
is pending. If the adversary now issues a single job b, then (assuming a ~ a*) A 
must to do the following: if 6 < 6»z, execute a, and if 6 > 5*z, then execute b. 
Thus in this case we need to apply SSP^j with the threshold (3 = a,,/b,, = Q—\. 

Suppose that we execute a. In the lower-bound strategy, the adversary would 
now issue two jobs b in the next step. But what happens if he issues a single job, 
say c? Calculations show that *4, in order to be Q-competitive, needs to use now 
yet a different parameter r\ in SSP^. This parameter is not uniquely determined, 
but it must be at least 7 = (3 — 2Q) /(2 — Q') > Q — 1. Further, it turns out that 
the same value 7 can be used on subsequent single-job requests. 

Our algorithm is derived from the above analysis: on a sequence of single-job 
requests in a row, use SSP,, with parameter a in the first step, then (3 in the 
second step, and 7 in all subsequent steps. In general, of course, two jobs can be 
issued at each step (or more, but only two heaviest jobs need to be considered.) 
We think of an algorithm as a function of several arguments. The values of this 
function on the boundary are determined from the optimal adversary strategy, 
as explained above. The remaining values are obtained through interpolation. 

We now give a formal description of our algorithm. Let 

Q-lftiO.38, 7=^^^~0.39, 

<5(/i,0 = fJ-a + {1 - fi)[P + {'f - f3)\{0], 

where 0 < /r < 1 and cx < ^ < "f. Note that for the parameters fx, ^ within 
their ranges, we have 0 < A(^) < 1, a < S{fiA) < 7- Function S also satisfies 
(5(1,^) = a, (5(0,^) > P for any and 5(0, a) = /3, 5(0, /3) = 7. 

Algorithm Switch. Fix a time step t. Let u, v (where it > u) be the two jobs 
released at time t — 1, and 6, c (where 5 > c) be the jobs released at time t. Let 
z € {u,v} be the old pending job. Let also ^ be the parameter of SSP used at 
time t — 1 (initially ^ = a). Then choose the job to execute as follows: 

• If z = It run SSP^ with 77 = 5{v/u, ^), 

• If z = u run SSP^ with 77 = a. 

Theorem 5.3. Algorithm Switch is Q-competitive for the 2-uniform case, 
where Q ~ 1.377 is the largest root of — 4Q -|- I = 0. 



a=^»0,27, 13 = 



Improved Online Algorithms for Buffer Management in QoS Switches 215 



The proof of the theorem is by tedious case analysis of an appropriate po- 
tential function and will appear in the full version of this paper. 



Conclusions 

We established the first upper bound better than 2 on the competitiveness of 
deterministic scheduling unit jobs to maximize weighted throughput. There is 
still a wide gap between our upper bound of ~ 1.939 and the best known lower 
bound of cj). Closing or substantially reducing this gap is a challenging open 
problem. We point out that our algorithm GenFlag is not memoryless, as it 
uses one bit of information about the previous step. Whether it is possible to 
reduce the ratio of 2 with a memoryless algorithm remains an open problem. 

Another open problem is to determine the competitiveness for similarly or- 
dered or s-uniform instances. The best lower bound for similarly ordered in- 
stances is (j) (from the 2-bounded case), our algorithm is ~ 1.838-competitive. 
Our algorithm does not obey the FIFO rule, on the other hand, the 1.75- 
competitive algorithm from the FIFO model [7] does not apply to similarly 
ordered instances. It would be interesting to obtain any algorithm that obeys 
the FIFO rule and is better than 2-competitive for similarly ordered instances. 
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Abstract. Many network applications that need to distribute contents 
and data to a large number of clients use a hybrid scheme in which one 
or more multicast channel is used in parallel to a unicast dissemination. 
This way the application can distribute data using one of its available 
multicast channels or by sending one or more unicast transmissions. In 
such a model the utilization of the multicast channels is critical for the 
overall performance of the system. We study the scheduling algorithm of 
the sender in such a model. We describe this scheduling problem as an 
optimization problem where the objective is to maximize the utilization 
of the multicast channel. Our model captures the fact that it may be 
beneficial to multicast an object more than once (e.g. page update). 
Thus, the benefit depends, among other things, on the last time the 
object was sent, which makes the problem much more complex than 
previous related scheduling problems. Using the local ratio technique we 
obtain a 4-approximation algorithm for the case where the objects are 
of fixed size and a 10- approximation algorithm for the general case. We 
also consider a special case which may be of practical interest, and prove 
that a simple greedy algorithm is a 3-approximation algorithm in this 
case. 



1 Introduction 

Web caching is a common way to overcome problems such as congestion and 
delay that often arise in the current Internet. Using this mechanism, web caches 
are deployed between web clients and web servers and store a subset of the server 
content. Upon receiving a request from a client (or from another cache) a web 
cache tries to satisfy this request using a local (valid) copy of the required object. 
If the cache does not hold such a copy, it tries to retrieve the object from other 
caches (e.g. using Internet Cache Protocol [15]), or from the original server (e.g. 
using HTTP [7]). 

Web caching can be useful in several aspects. From the client’s point of 
view, the presence of caches reduces the average response time. Internet Service 
Providers (ISP’s) deploy web caches in order to reduce the load over their links 
to the rest of the Internet. Web servers benefit from web cache deployment as 
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well since it reduces the amount of requests that should be handled by them. 
These advantages led to the concept of Content Distribution Network (CDN). 
CDN’s distribute content on behalf of origins web servers in order to offload 
work from these servers. Although CDN’s differ in their implementation model 
(e.g. distribution and redirection techniques) [11], they all take advantage of web 
caching in order to bring the content close to the clients. 

A common and widespread technique that is used by CDN’s to distribute 
their content is to push objects from web servers into web caches using a 
multicast-based channel. Using a broadcast channel such as satellite link, a CDN 
can push an object into many caches in one transmission instead of many unicast 
(terrestrial) transmissions. In this kind of a model web caches receive requests 
from clients (or other caches). If a request cannot be satisfied locally (i.e. there 
is no valid copy of the required object) the cache sends the request to the origin 
server. Then, the server sends the object to the cache using unicast transmission 
or it can distribute it to all caches using its multicast channel 

The above mentioned CDN model is an example of a network application that 
uses both a multicast channel and unicast dissemination. This hybrid scheme is 
used by many other network applications that distribute contents and data to 
a large number of clients. The multicast channel is a limited resource and the 
utilization of the channel is an important issue. Determining which data should 
be sent by multicast and which data should be sent by unicast is a big challenge. 

Other examples of network applications that may use this kind of hybrid 
scheme are reliable multicast protocols, real-time streaming protocols, and a 
context distribution systems. In a reliable multicast protocol (see, e.g., [12,8]) 
the data must be received by all clients. When a client does not receive a packet, 
it sends a feedback to the source that must retransmit this packet. The retrans- 
missions of lost packets are critical since they may prevent clients from receiving 
other packets (according to the client receive window) and may cause timeout 
events. A real-time streaming protocol (see, e.g., [13]) is used to send multimedia 
streams to a set of clients. In this application, retransmission of lost packets is 
not mandatory and clients can use fractional data. However, retransmissions can 
be done to improve quality but these retransmissions are subject to time con- 
straints since delayed data is useless. In both applications the multicast channel 
is used by the source to distribute the original transmissions while lost packets 
may be retransmitted using the multicast channel or the unicast dissemination. 
In a Context Distribution System pre-defined items that have time character- 
istic and time restriction, are distributed from producers to consumers. In [6] 
the authors study the efficiency of such a distribution using both a multicast 
channel and unicast dissemination. 

In this paper we study the problem of utilizing a multicast channel in this 
hybrid scheme. In particular we concentrate on the scheduling algorithm that 
determines which data is sent via multicast channel at any given time. The prob- 
lem of multicast scheduling has been discussed before in many papers. In [2,9] 
the authors present the problem of scheduling objects over a broadcast based 
channel. In both [9] and [2] the broadcast channel is the only available channel 
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and the objective function is to minimize the average response time (i.e. the 
period between the arrival of a request to the time in which this request is satis- 
fied) . In this kind of a model the scope of the cost function is very limited since 
it is proportional to the number of the requesters and the waiting time. Both 
papers focus on the on-line problem and present on-line scheduling algorithms 
for their problem. Acharya and Muthukrishman [1] extend these results by a 
preemption mechanism that allows to interrupt a transmission and serve other 
requests before resuming an interrupted transmission. Moreover, additional ob- 
jective functions are analyzed. Nevertheless, these cost functions still depend on 
the service time and they are proportional to the number of requesters and the 
waiting time. Su and Tassiulas [14] analyze the off-line problem of broadcast 
scheduling and formulate it as an optimization problem using a similar objective 
function (i.e. minimize the average response time). 

The off-line problem of multicast scheduling in a hybrid scheme is defined 
in [5] . In this paper the authors describe this scheduling problem as an optimiza- 
tion problem in which the goal is to maximize the utilization of the multicast 
channel. In order to formulate the problem they have defined a benefit function 
that determines the amount of benefit one can get when a specific object is sent 
over the multicast channel at a specific time. This benefit function differs from 
one application to another and it depends on the objective function (e.g. min- 
imizing the delay, minimizing the total transmissions, reducing the utilization 
of the unicast dissemination) and on other parameters such as time constraints 
attached to the specific model. They have shown an algorithm that solves the 
problem in polynomial time for the case where the objects are of fixed size. In 
addition, a 2-approximation algorithm was presented for the general case. 

Nevertheless, in the model used in [5], an object can be sent by multicast only 
once. Such a restricted model cannot be used to accurately describe the scheme 
we presented. In web page distribution pages may have an expiration date, and a 
page may be updated after this time. In reliable multicast and streaming a packet 
can be lost again, and thus there may be a positive benefit from broadcasting it 
again. In this paper we present an extended model for the same hybrid scheme, 
in which the benefit function is more general. In particular, the benefit function 
depends on previous transmissions of the same object, as it can be sent multiple 
times. This scheduling problem is much more complex since the benefit at a 
specific time depends on the previous transmission of an object. 

The problem is defined in Section 2. In Section 3 we present a non-trivial 
transformation that maps this problem to a special case of maximum indepen- 
dent set. Using this mapping we devise a 4-approximation algorithm for the case 
in which the objects have fixed size. In Section 4 we examine the general case 
in which the objects are of a variable size, and present a 10-approximation al- 
gorithm for this case. Both algorithms are based on the loeal ratio technique [4]. 
In Section 5 we study a simple greedy algorithm. We show that even though in 
general the approximation ratio of this algorithm is unbounded, if the benefit 
function follows certain restrictions, which are of practical interest, it is bounded 
by 3. A discussion related to the hardness of the problem (both in the off-line 
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and on-line cases) and the proof of the 10-approximation algorithm presented in 
Section 4, is left for the full version of this paper. In addition we also study the 
case where fc > 1 multicast channels are available, and we obtain approximation 
algorithms with the same performance guarantees in this case. 

2 Model and Problem Definition 

In the off-line problem the sequence of requests is known in advance, namely 
the source determines its scheduling according to full information. This kind of 
a problem takes place in applications such as Information Notification Services 
where the clients specify their requirement in advance. In other applications 
such as CDN or reliable multicast protocols the sequence of requests is obtained 
during the execution of the scheduling. In order to employ an off-line schedul- 
ing algorithm in such models one should predict the sequence of requests. For 
instance the sequence can be predicted using statistical information gathered 
during previous periods. 

First we restrict the discussion to the case where all objects have the same 
size. We divide the time axis into discrete time slots, such that it takes one time 
slot to broadcast one object. In this case a scheduling algorithm determines which 
object should be sent via multicast in each time slot. Later on in Section 4 we 
discuss the general case, where different objects may have different sizes. 

2.1 The Benefit Function 

The benefit function describes the benefit of sending a specific object via the 
multicast channel at a specific time. This benefit is the difference between the 
benefit one can get with and without the multicast channel and it depends 
on the objective function of each application. The benefit of sending an object 
at a specific time depends on the requisiteness of the object before and after 
that time. Therefore, the exact computation of the benefit function requires 
knowledge regarding the full sequence of requests, and it can be computed and 
used only in the off-line problem. In addition, the benefit function depends on 
the last time at which the object has been sent, since this previous transmission 
updates all clients and affects their need for a new copy. According to these 
observations we define the following benefit function. 

Definition 1 (Benefit Function). The benefit function is the benefit of 

broadcasting object pj at time slot t where t' was the last time (before t) where 
Pj was broadcasted. In addition, Btpj is the benefit of broadcasting object pj at 
time slot t, where t is the first time slot where Pj was broadcasted. Otherwise 
(i.e. where t' > t) Bt^t'j = 0. 

It is very convenient to represent the benefit function by m matrices of size 
(T-|- I) X (T-|- I) where m is the number of objects and T is the number of time 
slots in the period. In this kind of representation, the benefit function of each 
object is represented by one matrix, and the elements in the matrix describe the 
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benefit function of the object in the following way: is the element in column 

t and row t' of matrix j. Note that according to Definition 1, t' < t, and therefore 
the benefit matrices are upper diagonal. For instance, Figure 1 depicts a benefit 
function of two objects for a period of 3 time slots. Assuming that the multicast 
channel is always available, one cannot lose by sending an object via multicast, 
hence the benefit function is non negative (i.e. VO < f' < t < T,j, > 0). 
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Fig. 1. A Scheduling Example 

For example, consider a CDN application that uses a satellite based channel 
as a multicast channel. The CDN server distributes objects that have an expi- 
ration date or a max-age character and its objective function is to reduce usage 
of the unicast dissemination, namely to reduce the amount of data that is sent 
via unicast. An expiration date of an object refers to a time at which the object 
should no longer be returned by a cache unless the cache gets a new (validated) 
version of this object from the server with a new expiration date. A max-age 
time of an object refers to the maximum time, since the last time this object has 
been acquired by a cache, at which the object should no longer be returned by 
the cache unless it gets a new (validated) version of this object from the server 
with a new max-age time (see [7, Section 13.2]). Given a set of objects and their 
sequence of requests for a period of T time slots, the benefit function can be 
easily built as follows: for each object pj and for each time slot t, we calculate 
the benefit of broadcasting pj at time slot t, assuming that t' was the last time 
slot before f, at which the object was broadcasted. This can be done (assuming 
that the sequence of request is known) by simulating the case where the object 
is not sent via multicast vs. the case in which the object is sent via multicast. 
For instance, consider an object pj with max-age of five time slots and assume 
that this object is required by cache number one at time slots 2, 6, and 8. We 
calculate as follows. If the object is broadcasted at time slots 1 and 5, it 

means that the object is not sent via unicast at all. On the other hand, if the 
object is not broadcasted at time slot 5 (i.e. only at time slot 1), it must be sent 
via unicast at time slot 8, namely B^^ij is equal to one transmission of pj. 

2.2 Problem Definition 

The Time Dependent Multi Scheduling of Multicast problem (TDMSM) is de- 
fined as follow: Given a benefit function, find a feasible schedule with maximum 
benefit. A natural way to represent a schedule is by a function S where S(t) 
stands for the index of the object j that was sent at time slot t. {S may be 
undefined on part of the time slots.) The benefit of a schedule is the sum of the 
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benefit values that are derived from it, namely if t and t' are two consecutive 
transmission time slots of object pj, then is added to the total benefit. 

For instance, assume that object pi is sent at time slot 2 and object p 2 is sent 
at time slots 1 and 3. This schedule can be represented by: 5'(1) = 2,S'(2) = 1, 
and 5'(3) = 2, and the total benefit of this schedule is + -^ 1 , 0,2 + ^ 3 , 1 , 2 - 

Another way to represent a schedule is by a set of triples, where each triple 
contains the index of the object j, the time it was sent t, and the last 
time before t at which the object was sent t' (or 0 if the object was not sent 
before). In this kind of representation, the triple is in the set if and 

only if t and t' are two consecutive transmissions of object Pj (t > t'). For 
instance, the schedule described above can be represented by the following set 
of triples: {(2, 0, 1), (1, 0, 2), (3, 1, 2)}. Although the notation of triples seems to 
be unnatural and more complex, it has the advantage that the benefit values 
of are derived directly from the triples. In other words, each triple contains the 
time slots of two consecutive transmissions, therefore the triple is in the 

set if and only if the benefit value should be added to the total benefit 

(in the former representation, using a set of pairs, one should find a consecutive 
transmissions by parsing the whole set). Nevertheless, not every set of triples 
represent a feasible schedule. For instance if the triple (6, 4, 3) is in the set it 
indicates that object p^ was sent at time slots 4 and 6. Since other objects cannot 
be sent at these time slots, the triple (7, 6, 9) cannot be in the set (i.e. object pg 
cannot be sent at time slot 6). Moreover the triple (6,4,3) indicates that time 
slots 4 and 6 are two consecutive transmissions of object pa, namely it was not 
sent at time slot 5, therefore the triple (8, 5, 3) cannot be in the set. On the other 
hand it is clear that every schedule can be represented by a set of triples (every 
two consecutive transmissions of an object are a triple in the set) . In Section 3 
we use the notation of triples and the correlation between triples and a benefit 
values to represent the scheduling problem as an independent set problem and 
to present efficient scheduling algorithm that are based on this representation. 



3 A 4- Approximation Algorithm 

In Section 2.2 we showed that a feasible schedule can be represented by a set of 
triples, where each triple (t, j) contains the index of the object j, and the time 
slots of two consecutive transmissions of this object: t and t' {t > t'). In this 
section we use this notation and the correlation between triples and benefit values 
(i.e. the triple (t, t' ,j) is in the set if and only if the benefit value Bt^t'j is added 
to the total benefit) to devise a 4-approximation algorithm to TDMSM. The 
algorithm uses the local-ratio technique [4] and is based on the representation 
of TDMSM as special case of the maximum weight independent set problem. 

It is very convenient to represent a triple (t, t' , j) as an interval, where the end 
point of the interval are t and t' . Using this kind of representation we can describe 
the relations between triples according to their location of the corresponding 
intervals. We say that a triple (t,t',j) intersects a triple (ti,t'i,j) (where t > t' 
and ti > t[) if the intervals (F, t) and {t[,ti) intersect. In other words, the triples 
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intersect if t'l < t and ti > t' (The right part of Figure 2 depicts the intervals 
of two triples and that intersect). A triple overlaps a 

triple (where t > t', > t'l and j ^ k) if the intervals (t', t) and (t), ti) 

have at least one common end point that is not zero (i.e. t = ti or t' = ti or 
t = t'l or t' = t'l 0) (The left part of Figure 2 depicts the intervals of two triples 
{t,t',j) and (ti,t'i,k) that overlap). A set of triples is called feasible if it does 
not contain overlapping triples or intersecting triples. A feasible set of triples S 
is called maximal if V(t,t',j) ^ S, the set S' = {(t, j)} U ^ is not feasible (i.e. 

one cannot add a triple to the set without turning it into an infeasible set) . 
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Fig. 2. Overlapping and intersecting triples. 

Lemma 1. A set of triples that describes a feasible schedule is feasible. 

Proof. Consider a feasible schedule in which an object pj is sent at time slots 
... ,tj^. The scheduling of object pj is represented by the following set of 
triples {(tj^jO, j), j); ■ • • i therefore it does not contain in- 

tersecting triples. Also, in a feasible schedule only one object can be sent every 
time slot, namely object pi {i yf j) cannot be sent at time slots tji,... 
therefore it does not contain overlapping triples. □ 

Lemma 2. A maximal set of triples S induces a feasible schedule. 

Proof. Since S does not contain overlapping triples, at most one object is sent at 
every time slot. Since S does not contain intersecting triples, if ft, t' ,j) € S, then 
t and t' are two consecutive transmissions of pj. The maximality requirement 
ensures that if t and t' are two consecutive broadcastings then ft,t' ,j) G S. □ 

Given a benefit function, we can construct a weighted graph G = fV,E), in 
which every triple ft,t',j) is represented by a vertex G V whose weight is 

An edge G E ii ft,t',j) intersects or overlaps 

fti,t'i,k). Henceforth we say that intersects (overlaps) if the triple 

ft,t',j) intersects (overlaps) the triple fti,t'i,k). 

By the construction of G, finding a maximum and maximal independent set is 
equivalent to finding a solution to TDMSM. Finding a maximal independent set 
implies a feasible schedule, since an independent set does not contain adjacent 
vertices, namely the corresponding set of triples does not contain overlapping or 
intersecting triple. Moreover, the weight of the independent set is derived form 
the weight of the vertices in the set, which are equivalent to the benefit values 
in the schedule. Similarly, a feasible schedule does not contain intersecting and 
overlapping triples, therefore the corresponding vertices in G are independent 
set, and the benefit of the schedule is the same as the weight of the set. 



Time Dependent Multi Scheduling of Multicast 223 



In general, maximum independent set is hard to approximate [10]. Never- 
theless, the graph that is derived from a TDMSM instance is a special case, in 
which the edges of the graph are characterized by the intersecting and overlap- 
ping notation. We describe a recursive algorithm that finds an independent set 
in G, and we prove that the algorithm is 4-approximation. We note that unlike 
other scheduling problems that were approximated using this kind of technique 
(see, e.g., [3]), the output of our algorithm does not induce a feasible schedule. 
Namely, our algorithm finds an independent set and not necessarily a maximal 
independent set. To obtain a feasible schedule one should extend the output of 
the algorithm to a maximal independent set. Since the benefit function is non- 
negative, the benefit of the extended solution in not less than the benefit of the 
original independent set. Thus, the corresponding schedule is 4-approximate. 



Algorithm IS{G = (V,E),w) 




1. 


If V = 4> return cj). 




2. 


Choose S R, such that t < G. 

{ w{vt,t',j) k) G E 


3. 


'^Vti,t[,k G V, Wi(vt^^p^^k) = < 


0 otherwise 


4. 


For all Vt^,t{,k G V, W 2 (vt,,t{, 


k) = w{vt^^t[,k) - Wi{vt^^t{,k)- 


5. 


Denote V = \ W 2 {vt^ 


G,k) < 0}. 


6. 


v = v\v. 




7. 


S = IS{G={V,E),w2). 




8. 


If 5 U is an independent set S = SVJ {vt,t',j} 


9. 


Else S = S. 




10. 


return S 





The number of vertices in the graph depends on the number of objects and 
the number of time slots, namely \V\ = ^mT{T—l). In Line 2 the algorithm picks 
a vertex such that t is minimal. If more than one vertex can be selected, it 
is efficient to pick the vertex with maximum weight. There are 0{T) iterations 
since the minimal t increases in every iteration. In a naive implementation every 
iteration requires 0{mT^) operations (finding the maximum weight, subtracting 
weights, and removing vertices) thus the complexity of the algorithm is O(mT^). 

Next we prove that Algorithm IS is a 4-approximation algorithm using the 
Local Ratio Theorem for maximization problem (see [3] for more details). 

Theorem 1 (Local Ratio [3]). Let F he a set of eonstrains and let p, pi and 
P 2 be a benefit funetions such that p = p\ + P 2 - If x is r- approximation solution 
with respect to p\ and P 2 then x is r- approximation with respect to p. 

In our context the benefit functions are w, W\ and W2 (according to Line 4, 
w = W 2 + w\). The constraints are that the set (of vertices) is independent. 

Lemma 3. The weight of a maximum independent set with respect to (G,wi) 
is at most 4 • 
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Proof. By construction of wi vertices that overlap or intersect (including 

are given a weight of wi{vt^t',j) (Line 3). All other vertices have a weight of 
zero and their contribution to the weight of the independent set is null. Therefore, 
we consider only vertices that intersect or overlap We divide these vertices 

into two sets. Denote by I{vt^t',j) the set of vertices that intersect including 
and by 0{vt^t',j) the set of vertices that overlap including 

Consider two vertices ,j S Both vertices intersect 

namely t'l < t and t '2 < t. Also, t <t\ and t <t 2 since t is minimum (Line 2). 
Thus, and j intersect. This means that I(vt^t',j) induces a clique, and 

hence any independent set cannot contain more than one vertex from I(vt^t',j)- 
Observe that any ,k S 0{vt,t',j) satisfies ti = t, t'l = t, or t'l = t' {ti = t' 
is not possible since t is minimum). Thus we divide into three subsets: 

~ {"^ti ,/c I ^ 0{vt^t' .j) ^ ifl — t)} 

02{vt,t',j) = Vti,t{,k e 0{vt,t',j) A {t[ = t)} 

.j) ^ A (tl 0} 

Consider two vertices Vt^ti,k,Vtt/^^i € Oi(vt^t'j)- If k = I the vertices intersect 
(since ti < t and < t), otherwise the vertices overlap. Thus, Oi{vt^t',j) in- 
duces a clique, and an independent set cannot contain more than one vertex 
from Oi(vt^t\j)- Similarly, it can be shown that 02{vt,t',j) and induce 

cliques. Hence, an independent set cannot contain more than one vertex from 
02 {vt,t',j) and one vertex from 

Putting it all together, an independent set contains at most four vertices. The 
weight of each vertex is wi{vt,t'j), thus its maximum weight is 4- wi{vt,t'j)- □ 

Theorem 2. Algorithm IS is a approximation algorithm. 

Proof. The proof is by induction on the recursion. At the induction base (Line 1 
of IS) V = 0, and the algorithm returns an optimal solution. For the induction 
step, we assume that S is 4-approximate with respect to (G,W 2 ) (Line 7). We 
show that S is 4-approximate with respect to (G,W 2 ) and (G,wi). This proves, 
by Local Ratio Theorem, that S is 4-approximate with respect to (G,w). 

Due to Line 5 V contains vertices with non-positive weight, thus the opti- 
mum with respect to (G,W 2 ) is not greater than the optimum with respect to 
(G, IV 2 ). Hence S is 4-approximate with respect to (G, ^ 2 )- Due to Lines 3 and 4, 
u! 2 {vt,t',j) = 0, therefore adding to S (in Line 8) does not reduce the weight 
of the independent set. Hence S is 4-approximate with respect to (G, ^ 2 )- 

S contains Vt^t'j or at least one of its neighbors (otherwise Vt^t' ,j have been 
added to S in Line 8). Hence, the weight of S is at least wi{vt,t'j) with respect 
to (G,wi) where the maximum independent set is 4 • wi{vt,t',j) (see Lemma 3), 
namely S is 4-approximate with respect to (G, rci). □ 

4 Variable Size Objects 

In the general case, where different objects may have different sizes we divide the 
time axis such that the transmission of one object requires one or more integral 



Time Dependent Multi Scheduling of Multicast 225 



time slots. One way to satisfy this requirement is to find the greater common 
divider of the objects’ sizes and use this value as the length of the time slot. We 
denote by Tj the number of time slots it takes to broadcast pj. For instance, if the 
transmission times of pi, P2 and p^ are 0.4, 1 and 1.4 millisecond, respectively, 
the length of the time slot can be 0.2 millisecond and ri = 2, T 2 = 5 and T 3 = 7. 

The benefit function and the problem definition in this case are similar to 
those defined in Section 2. However, the different sizes of the objects induce 
some changes that are reflected in the feasibility of the schedule and in the exact 
definition of the benefit function. When pj is sent at time slot t the subsequent 
Tj time slots are occupied by this transmission. Thus, in a feasible schedule 
these time slots cannot be used for other transmissions. Also, if t and t' are two 
consecutive transmissions of pj, t cannot be smaller than t' + Tj, namely the 
gap between two consecutive transmissions is at least Tj. Hence, VF > 0,f < 
t' + Tj, Btj'j = 0. Note that in Section 2.1, where Vj, Tj = 1, we implicitly 
considered this restriction by defining the benefit function to be 0 where t' > t. 

In the full version of the paper we show a 10-approximation algorithm to 
TDMSM using the same technique that has been used in Section 3. This al- 
gorithm is a modified version of Algorithm IS in which in every iteration we 
consider vertices that correspond to objects with a minimal transmission time, 
namely the following steps should replace Line 2 in IS: 

2a. Choose j such that Vfc, r, < t^ 

2b. Choose & V, such that t <ti 

5 An Efficient Algorithm for a Special Case 

The approximation results described so far hold for any non-negative benefit 
function. However, when using this scheme in practical scenarios, the benefit 
function may obey several non-trivial limitations. In this section we consider 
certain limitations on the benefit function, and we show that in this case a 
greedy algorithm approximates the optimal solution by a factor of 3. 

Consider a scenario in which a source sends object pj via multicast at 
two consecutive time slots fy and t^, where fy > ti. The benefit of such 
a schedule is If the source transmits pj also at time slot fy, where 

t\ <t 2 < fy, then the total benefit that of these transmissions is Bt,^^t2,j+Bt2,ti,j- 
In general, this additional transmission may decrease the total benefit (i.e., 
< Bt 2 ,ti,j)- However, this does not happen in practice. Re- 
call that Bt,^ ti,j represents the benefit of broadcasting pj at time slot fy given 
that the previous multicast transmission was performed at time slot t\. There- 
fore, a multicast transmission at t 2 should not add unicast transmissions of pj. 
Therefore, we may assume that for any object pj, and time slots t\ < t 2 < fy the 
following triangle inequality holds: 5*3, * 2 , j + Bt2,ti,j > 5*3,*^,^. Notice that due 
to the triangle inequality we may assume that there exists an optimal solution 
that uses the multicast channel in every time slot. 

Next we consider a restriction that indicates that benefit cannot appear sud- 
denly. Assume that a source sends pj via multicast at time slots ^2 and ts where 
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ts > t2- The benefit of such schedule is . In many practical scenarios this 
benefit comes from the fact that the object was sent at ts (regardless of the 
previous time it was sent) and from the fact that it was already sent at t2- Thus, 
for any time slot ti such that ti < t2 the following skewed triangle inequality 
should hold Bt2,ti,j+Bt^^ti,j > ^(3. *2,1- the full version we show that TDMSM 

remains NP-hard even under both restrictions. 

Recall that a schedule S' is a function, where S{t) is the index j of the object 
that was sent at time slot t. Let Ps(j, t) be the time slot before time t in which 
object j was last broadcasted in solution S. If object j was not broadcasted 
before time t we define Ps(j,t) = 0. Also, let Ns(j, t) be the first time slot 
after time t in which object j was broadcasted in solution S. If object j was not 
broadcasted after time t we define = T + 1 . We denote by B{S) the 

benefit gained by the solution S. 

Next we present our greedy algorithm. 

Algorithm Greedy 

1. for t = 1 to T do 

2. £ ^ argmax^ {Rt PQ t)j| 

3. Broadcast 



For general benefit functions Greedy does not guarantee any approximation 
ratio. For instance, consider the following benefit function: 



Object 1: 





0 12 






0 12 


0 


0 e 0 


Object 2: 


0 


0 2e 0 


1 


0 0 1 


1 


0 0 0 


2 


0 0 0 




2 


0 0 0 



In this case the benefit of the optimal schedule is Bi,o,i + = 1 + e, while 

the benefit of the schedule returned by the algorithm is ^2,0,2 + .62,1,2 = 2e. 
Nevertheless, we show that if the benefit function follows the above mentioned 
restrictions. Algorithm Greedy returns 3-approximate solutions. 

We denote by G the greedy solution and by Opt an optimal solution. We 
also define a series of hybrid solutions denoted by iJo, • ■ • , Hr, where 



Ht{i) 



G{i) i < t, 
Opt(j) i > t. 



Note that Hq = Opt and Ht = G. We now examine the difference in benefit 
gained by Ht-i and by Ht- Denote by At the difference in benefit gained by 
iLt_i and Ht, that is At = B{Ht-i)~ B{Ht). We show At < 2 -Bt^p^(c(t),t),G(t)- 
Let j = Opt(f) and £ = G(t). Then, 



HtU ,t) ,P Ht(j ,t) d 
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By the triangle inequality Thus, 

By the skewed triangle inequality, 

which means that At < 2Bt^p^^(j^t)j- Ht{i) = G{i) for every i < t by the 

definition of Ht, thus At < 2Bt^p^(j^t),j- Bt,PaU,t)J ^ Greedy 

picks the object that maximizes the benefit, and therefore At < 2Bt^p^(^i^t),i- 
Greedy is a 3-approximation algorithm since B{G) > | • B(Opt): 

B(Opt)-B(q = B(7To)-B(iJT) = Er=iA<2-Er=i^qPnriG(t),ri.G(ri=2S(G) ■ 
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Abstract. This paper considers the convergence problem in au- 
tonomous mobile robot systems. A natural algorithm for the problem 
requires the robots to move towards their center of gravity. Previously it 
was known that the gravitational algorithm converges in the synchronous 
or semi-synchronous model, and that two robots converge in the asyn- 
chronous model. The current paper completes the picture by proving 
the correctness of the gravitational algorithm in the fully asynchronous 
model for any number of robots. It also analyses its convergence rate, 
and establishes its convergence in the presence of crash faults. 



1 Introduction 

1.1 Background and Motivation 

Swarms of low cost robots provide an attractive alternative when facing various 
large-scale tasks in hazardous or hostile environments. Such systems can be made 
cheaper, more flexible and potentially resilient to malfunction. Indeed, interest 
in autonomous mobile robot systems arose in a variety of contexts (see [4,5,14, 
15,16,17,18,19,20,21,28] and the survey in [6,7]). 

Along with developments related to the physical engineering aspects of such 
robot systems, there have been recent research attempts geared at developing 
suitable algorithmics, particularly for handling the distributed coordination of 
multiple robots [3,8,9,22,24,26,27]. A number of computational models were pro- 
posed in the literature for multiple robot systems. In this paper we consider the 
fully asynchronous model of [8,9,12,23]. In this model, the robots are assumed 
to be identical and indistinguishable, lack means of communication, and operate 
in Look-Compute-Move cycles. Each robot wakes up at unspecified times, ob- 
serves its environment using its sensors (capable of identifying the locations of 
the other robots), performs a local computation determining its next move and 
moves accordingly. 

Much of the literature on distributed control algorithms for autonomous mo- 
bile robots has concentrated on two basic tasks, called gathering and convergence. 
Gathering requires the robots to occupy a single point within finite time, regard- 
less of their initial configuration. Convergence is the closely related task in which 
the robots are required to converge to a single point, rather than reach it. More 
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precisely, for every e > 0 there must be a time by which all robots are within 
a distance of at most e of each other. 

A common and straightforward approach to these tasks relies on the robots 
in the swarm calculating some median position and moving towards it. Arguably 
the most natural variant of this approach is the one based on using the center of 
gravity (sometimes called also the center of mass, the barycenter or the average) 
of the robot group. This approach is easy to analyze in the synchronous model. 
In the asynchronous model, analyzing the process becomes more involved, since 
the robots operate at different rates and may take measurements at different 
times, including while other robots are in movement. The inherent asynchrony 
in operation might therefore cause various oscillatory effects on the centers of 
gravity calculated by the robots, preventing them from moving towards each 
other and possibly even cause them to diverge and stray away from each other 
in certain scenarios. 

Several alternative, more involved, algorithms have been proposed in the 
literature for the gathering and convergence problems. The gathering problem 
was first discussed in [26,27] in a semi-synchronous model, where the robots 
operate in cycles but not all robots are active in every cycle. It was proven 
therein that it is impossible to gather two oblivious autonomous mobile robots 
without a common orientation under the semi-synchronous model (although 2- 
robot convergence is easy to achieve in this setting). On the other hand, there 
is an algorithm for gathering N > S robots in the semi-synchronous model 
[27]. In the asynchronous model, an algorithm for gathering = 3,4 robots is 
presented in [9,23], and an algorithm for gathering N > 5 robots has recently 
been described in [8] . The gathering problem was also studied in a system where 
the robots have limited visibility [2,13]. Fault tolerant algorithms for gathering 
were studied in [1] . In a failure-prone system, a gathering algorithm is required 
to successfully gather the nonfaulty robots, independently of the behavior of 
the faulty ones. The paper presents an algorithm tolerant against a single crash 
failure in the asynchronous model. For Byzantine faults, it is shown therein that 
in the asynchronous model it is impossible to gather a 3-robot system, even in the 
presence of a single Byzantine fault. In the fully synchronous model, an algorithm 
is provided for gathering N robots with up to / faults, where A^ > 3/ -I- 1. 

Despite the existence of these elaborate gathering algorithms, the gravita- 
tional approach is still very attractive, for a number of reasons. To begin with, it 
requires only very simple and efficient calculations, which can be performed on 
simple hardware with minimal computational efforts. It can be applied equally 
easily to any number of dimensions and to any swarm size. Moreover, the errors 
it incurs due to rounding are bounded and simple to calculate. In addition, it is 
oblivious (i.e., it does not require the robots to store any information on their 
previous operations or on past system configurations). This makes the method 
both memory-efficient and self-stabilizing (meaning that following a finite num- 
ber of transient errors that change the states of some of the robots into other 
(possibly illegal) states, the system returns to a legal state and achieves even- 
tual convergence). Finally, the method avoids deadlocks, in the sense that every 
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robot can move at any given position (unless it has already reached the center 
of gravity). These advantages may well make the gravitational algorithm the 
method of choice in many practical situations. 

Subsequently, it is interesting to study the correctness and complexity prop- 
erties of the gravitational approach to convergence. This study is the focus of the 
current paper. Preliminary progress was made by us in [10], where we proved the 
correctness of the gravitational algorithm in the semi-synchronous model of [26] . 
In the asynchronous model, we provided a convergence proof for the special case 
of a 2-robot system. In the current paper, we complete the picture by proving 
a general theorem about the convergence of the center of gravity algorithm in 
the fully asynchronous model. We also analyze the convergence rate of the algo- 
rithm. Finally, we establish convergence in the crash fault model. Specifically, we 
show that in the presence of / crash faults, 1 < f < N — 2, the N — f nonfaulty 
robots will converge to the center of gravity of the crashed robots. 



1.2 The Model 

The basic model studied in [3,8,9,22,24,26,27] can be summarized as follows. The 
N robots execute a given algorithm in order to achieve a prespecified task. Each 
robot i in the system operates individually, repeatedly going through simple 
cycles consisting of three steps: 

— Look: Identify the locations of all robots in i's private coordinate system 
and obtain a multiset of points P = {pi, . . . ,pat} defining the current config- 
uration. The robots are indistinguishable, so i knows its own location pi but 
does not know the identity of the robots at each of the other points. This 
model allows robots to detect multiplicities, i.e., when two or more robots 
reside at the same point, all robots will detect this fact. Note that this model 
is stronger than, e.g., the one of [8]. 

— Compute: Execute the given algorithm, resulting in a goal point pQ. 

— Move: Move on a straight line towards the point pa- The robot might stop 
before reaching its goal point pa, but is guaranteed to traverse a distance of 
at least S (unless it has reached the goal). The value of S is not assumed to 
be known to the robots, and they cannot use it in their calculations. 

The Look and Move operations are identical in every cycle, and the differences 
between various algorithms are in the Compute step. The procedure carried out 
in the Compute step is identical for all robots. 

In most papers in this area (cf. [9,13,25,26]), the robots are assumed to be 
rather limited. To begin with, the robots are assumed to have no means of di- 
rectly communicating with each other. Moreover, they are assumed to be oblivi- 
ous (or memoryless), namely, they cannot remember their previous states, their 
previous actions or the previous positions of the other robots. Hence the algo- 
rithm used in the Compute step cannot rely on information from previous cycles, 
and its only input is the current configuration. While this is admittedly an over- 
restrictive and unrealistic assumption, developing algorithms for the oblivious 
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model still makes sense in various settings, for two reasons. First, solutions that 
rely on non-obliviousness do not necessarily work in a dynamic environment 
where the robots are activated in different cycles, or robots might be added to 
or removed from the system dynamically. Secondly, any algorithm that works 
correctly for oblivious robots is inherently self-stabilizing, i.e., it withstands tran- 
sient errors that alter the robots’ local states into other (possibly illegal) states. 

We consider the fully asynchronous timing model (cf. [8,9]). In this model, 
robots operate on their own (time-varying) rates, and no assumptions are made 
regarding the relative speeds of different robots. In particular, robots may re- 
main inactive for arbitrarily long periods between consecutive operation cycles 
(subject to some “fairness” assumption that ensures that each robot is activated 
infinitely often in an infinite execution). 

In order to describe the center of gravity algorithm, hereafter named Al- 
gorithm Go_to_CDG, we use the following notation. Denote by fi[t] the loca- 
tion of robot i at time t. Denote the true center of gravity at time t by 
c[t] = ^ dW - Denote by Ci[t] the center of gravity as last calculated by the 
robot i before or at time t, i.e., if the last calculation by i was done at time t' < t 
then Ci[t] = c[t']. Note that, as mentioned before, robot i calculates this location 
in its own private coordinate system; however, for the purpose of describing the 
algorithm and its analysis, it is convenient to represent these locations in a unified 
global coordinate system (which of course is unknown to the robots themselves) . 
This is justified by the linearity of the center of gravity calculation, which renders 
it invariant under any linear transformation. By convention Ci[0] = ri[0] for all i. 

Algorithm Go_to_C0G is very simple. After measuring the current configu- 
ration at some time t, the robot i computes the average location of all robot 
positions (including its own), Ci[t] = then proceeds to move to- 

wards the calculated point Ci[t\. (As mentioned earlier, the move may terminate 
before the robot actually reaches the point Ci[t], but in case the robot has not 
reached Ci[t], it must has traversed a distance of at least S.) 

2 Asynchronous Convergence 

This section proves our main result, namely, that Algorithm Go_to_C0G guaran- 
tees the convergence of N robots for any > 2 in the asynchronous model. 

As noted in [10], the convex hull of the robot locations and calculated centers 
of gravity cannot increase in time. Intuitively, this is because (1) while a robot 
i performs a Move operation starting at time to, Ci does not change throughout 
the Move phase and fj[t] remains on the line segment [ri[^o]jCi[^o]]> which is 
contained in the convex hull, and (2) when a robot performs a Look step, the 
calculated center of gravity is inside the convex hull of the N robot locations at 
that time. Hence we have the following. 

Lemma 1. [10] If for some time to, rj^o] Ci[to] for all i reside in a closed 
convex curve, V, then for every time t > to, ri[t] and Ci[t] also reside in V for 
every 1 < i < N . 
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Hereafter we assume, for the time being, that the configuration at hand is 
one-dimensional, i.e., the robots reside on the a;-axis. Later on, we extend the 
convergence proof to d-dimensions by applying the result to each dimension 
separately. 

For every time t, let H[t] denote the convex hull of the points of interest 
{fi[i\ I 1 < * < iV} U {ci[t] I 1 < i < iV}, namely, the smallest closed interval 
containing all 2N points. 



Corollary 1. For N >2 robots and times ti,to, if t\ > to then H[ti] C id[to]. 

Unfortunately, it is hard to prove convergence on the basis of h alone, since it 
is hard to show that h strictly decreases. Other potentially promising measures, 
such as 4>i and 4>2 defined next, also prove problematic as they might sometimes 
increase in certain scenarios. Subsequently, the measure 'i/' we use in what follows 
to prove strict convergence is defined as a combination of a number of different 
measures. Formally, let us define the following quantities. 



N 

i=l 

N 

Mt] = '^\cz[t] - fi[t]\ , 



= '/'iW + h[t] , 
h[t] = \H[t]\, 

m = §§ + h[t]. 



We now claim that (p, h and are nonincreasing functions of time. 

Lemma 2. For every ti > to, (j>[ti] < 4>[to]- 

Proof: Examine the change in p due to the various robot actions. If a Look 
operation is performed by robot i at time t, then c[t] — Ci [t] = 0 and \vi [t] — Ci [t] | = 
\ci[t*\ — c[t]| for any t* € \t' ,t], where t' is the end of the last Move performed 
by robot i. Therefore, 4> is unchanged by the Look performed. 

Now consider some time interval ^ [to,ti], such that no Look oper- 
ations were performed during Suppose that during this interval each 

robot i moved a distance Ai (where some of these distances may be 0). 
Then (f >2 decreased by A^, the maximum change in the center of gravity 
is |c[ti] — c[to]| < the robots’ calculated centers of gravity have 

not changed. Therefore, the change in pi is at most pi[ti] — pi[to] < J2iAi- 
Hence, the sum p = pi + p 2 cannot increase. | 

By Lemma 2 and Cor. 1, respectively, p and h are nonincreasing. Hence we have: 



Lemma 3. p is a nonincreasing function of time. 
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Lemma 4. For all t, h < ip < 2h. 

Proof: The lower bound is trivial. For the upper bound, notice that <p is the 
sum of 2N summands, each of which is at most h (since they all reside in the 
segment). | 

We now state a lemma which allows the analysis of the change in (p (and 
therefore also ip) in terms of the contributions of individual robots. 

Lemma 5. If by the action of a robot i separately, in the time interval [tojti] 
its contribution to (p is 5i, then <p[ti] < <p[to] + <5^. 

Proof: Lemma 2 implies that Look actions have no effect on (p and therefore 
can be disregarded. A robot moving a distance Ai will always decrease its term 
in p 2 by Ai, and the motions of other robots have no effect on this term. Denote 
by Aj the motions of the other robots. Notice that 

fA* j¥=i 

The function </>i contains N summands, each of which contains a contribution 
of at most from every robot j ^ i- Therefore, the total contribution of 

each robot to (pi is at most \Aj\, which is canceled by the negative contribution 

of|Z\j|to72- I 



Theorem 1. For every time to, there exists some time t > to such that 

'P[t\ < (i- 



Proof: Assume without loss of generality that at time to , the robots and centers 
of gravity resided in the interval H[to] = [0, 1] (and thus h[to] = 1 and ip[to] < 2). 
Take t* to be the time after to when each robot has completed at least one entire 
Look-Compute-Move cycle. There are now two possibilities: 

1. Every center of gravity Ci\t'] that was calculated at time t' G [t, t*] resided in 
the segment (^,1]. In this case at time t* no robot can reside in the segment 
[0, (since every robot has completed at least one cycle operation, where 
it has arrived at its calculated center of gravity outside the segment [0, ^], 
and from then on it may have moved a few more times to its newly calculated 
centers of gravity, which were also outside this segment). Hence at time i = t* 
all robots and centers of gravity reside in H[i\ C [^, 1], so /i[t] < 1 — 5^, 
and (p[i\ < p[to]- Therefore, 



ip[i\ = 




2N 2N 



p}[to] 



1 

m ■ 



Also, by Lemma 4, ip[t] < 2, hence ^ > -^ip[to]- Combined, we get ip[t\ < 

(1 - 4n) 
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2. For some ti € the center of gravity Ci[ti] = ^ calculated 

by some robot i at time ti resided in [0, ^]. Therefore, at time t\ all robots 
resided in the segment [0, (by Markov inequality [11]). We split again into 
two subcases: 

a) At time ti all centers of gravity, Ci[ti] resided in the interval [0, |j. In 
this case, take i = t*, h[t\ < h[ti] < | < |^[^o]i therefore 



where the last inequality is due to the fact that < h[to] as argued 
in the proof of Lemma 4. The theorem immediately follows, 
b) At time ti there existed robots with Ci[ti\ > |. In this case, take k to be 
the robot with the highest center of gravity (or one of them) and take 
t to be the time robot k completes its next Move. Its move size is at 
least Z\fc > I, hence its motion decreased \ fk — c^j by > I, and 4>2 is 
decreased by Ak- Also, The robots and calculated centers of gravity are 
now restricted to the segment [0, c^j. Since the true center of gravity must 
be to the left of Cfc, c — is decreased by Ak/N. The sum of all other 
summands in (f>i may increase by at most (N — 1)A]^/N. By Lemma 5 </> 
can be bounded from above by the contribution of a single robot. There- 
fore, (/) decreased by at least 2Au/N, and the term ^ in decreased 
by at least ^ As V'[^o] < 2 and h is nonincreasing, 

decreased by at least The theorem follows. | 



To prove the convergence of the gravitational algorithm in d-dimensional 
Euclidean space, apply Theorem 1 to each dimension separately. Observe that 
by Theorem I and Lemma 4, for every e > 0 there is a time te by which h[tf] < 
< e, hence the robots have converged to an e-neighborhood. 

Theorem 2. For any N >2, in d- dimensional Euclidean space, N robots per- 
forming Algorithm Go^to^COG will converge. 



3 Convergence Rate 

To bound the rate of convergence in the fully asynchronous model, one should 
make some normalizing assumption on the operational speed of the robots. A 
standard type of assumption is based on defining the maximum length of a 
robot cycle during the execution (i.e., the maximum time interval between two 
consecutive Look steps of the same robot) as one time unit. For our purposes it 
is more convenient to make the slightly modified assumption that for every time 
t, during the time interval [t, t-|- 1] every robot has completed at least one cycle. 
Note that the two assumptions are equivalent up to a constant factor of 2. Note 
also that this assumption is used only for the purpose of complexity analysis, 
and was not used in our correctness proof. 



Convergence Properties of the Gravitational Algorithm 235 



Lemma 6. For every time interval [to,ti], 

( 1 \ L^J 

i’iti] < V'N- 

Proof: Consider the different cases analyzed in the proof of Theorem 1. By our 
timing assumption, we can take t* = to + 1- In case 1, tp is decreased by a factor 
of 1 — ^ by time t = t* , i.e., within one time unit. In case 2a, ip is decreased 
by a factor of | by time t = t* , i.e., within one time unit again. The slowest 
convergence rate is obtained in case 2b. Here, we can take t = t* + 1 = to + 2, 
and conclude that tp is decreased by a factor of 1 — after two time units. 
The lemma follows by assuming a worst case scenario in which during the time 
interval the “slow” case 2b is repeated for times. | 

By the two inequalities of Lemma 4 we have that h[ti\ < ip\ti] and '0[^o] < 
2h[to\, respectively. Lemma 6 now yields the following. 

Theorem 3. For every time interval 

/ 1 \L^J 

h[ti] < h[to]. 



Corollary 2. In any execution of the gravitational algorithm in the asyn- 
chronous model, over every interval of 0{N'^) time units, the size of the d- 
dimensional convex hull of the robot locations and centers of gravity is halved in 
each dimension separately. | 

An example for slow convergence is given by the following lemma. 

Lemma 7. There exist executions of the gravitational algorithm in which I2[N) 
time is required to halve the convex hull of N robots in each dimension. 

Proof: Initially, and throughout the execution, the N robots are organized on 
the X axis. The execution consists of phases of the following structure. Each 
phase takes exactly one time unit, from time t to time t + 1. At each (integral) 
time t, robot 1 is at one endpoint of the bounding segment H[t] while the other 
— 1 robots are at the other endpoint of the segment. Robot 1 performs a 
Look at time t and determines its perceived center of gravity Ci [t] to reside at a 
distance from the distant endpoint. Next, the other A^ — 1 robots perform 
a long sequence of (fast) cycles, bringing them to within a distance e of robot 
1, for arbitrarily small e. Robot 1 then performs its movement to its perceived 
center of gravity Ci[t\. Hence the decrease in the size of the bounding interval 
during the phase is h\t\ — h\t + 1] = + e, or in other words, at the end of 

the phase h[t + 1] ~ h[t\. It follows that 0{N) steps are needed to 

reduce the interval size to a 1/e fraction of its original size. | 
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Note that there is still a linear gap between the upper and lower bounds on the 
convergence rate of the gravitational algorithm as stated in Cor. 2 and Lemma 7. 

It is interesting to compare these bounds with what happens in a variant of 
the fully synchronous model (cf. [27]), in which all robots operate at fixed time 
cycles, and the Look phase of all robots is simultaneous. It is usually assumed 
that the robots do not necessarily reach their desired destination. However, there 
is some constant minimum distance, S, which the robots are guaranteed to tra- 
verse at every step. Therefore, if H[0] is the convex hull of the N robots at time 
0 and /i[0] is the maximum width of 77 [0] in any of the d dimensions, then we 
have the following. 

Lemma 8. In any execution of the gravitational algorithm in the fully syn- 
chronous model, the robots achieve gathering in at most |"4/i[0]d^^^//S'] time. 

Proof: If the distance of each robot from the center of gravity is at most S, 
then at the next step they will all gather. Suppose now that there exists at least 
one robot whose distance from the center of gravity is greater than S. Since 
the center of gravity is within the convex hull, the largest dimension is at least 
^[0] > S/'/d. Without loss of generality, assume that the projection of the hull 
on the maximum width dimension is on the interval [0, a], and that the projection 
of the center of gravity c[0] is in the interval [f , a]. Then in each step, every robot 
moves by a vector min{rj — c, S' } for some S' > S. By assumption, a is the 

width of the largest dimension and therefore a > \fi — c\lVd. For every robot 
in the interval [0, |], the distance to the current center of gravity will decrease 
in the next step by at least mind,^^^^} > Thus, the width of at least 
one dimension decreases by at least in each step. Therefore, gathering is 
achieved after at most |’4/i[0]d^/^/S'] cycles, independently of N. | 

4 Fault Tolerance 

In this section we consider the behavior of the gravitational algorithm in the 
presence of possible robot failures. 

Let us first consider a model allowing only transient failures. Such a failure 
causes a change in the states of some robots, possibly into illegal states. Notice 
that Theorem 2 makes no assumptions about the initial positions and centers of 
gravity, other than that they are restricted to some finite region. It follows that, 
due to the oblivious nature of the robots (and hence the algorithm), the robots 
will converge regardless of any finite number of transient errors occurring in the 
course of the execution. 

We now turn to consider the crash fault model. This model, presented in [1], 
follows the common crash (or “fail-stop” ) fault model in distributed computing, 
and assumes that a robot may fail by halting. This may happen at any point in 
time during the robot’s cycle, i.e., either during the movement towards the goal 
point or before it has started. Once a robot has crashed, it will remain stationary 
indefinitely. 
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In [1], it is shown that in the presence of a single crash fault, it is possible to 
gather the remaining (functioning) robots to a common point. Here, we avoid the 
gathering requirement and settle for the weaker goal of convergence. We show 
that the Go_to_CDG algorithm converges for every number of crashed robots. In 
fact, in a sense convergence is easier in this setting since the crashed robots 
determine the final convergence point for the nonfaulty robots. We have the 
following. 

Theorem 4. Consider a group of N robots that execute Algorithm Go_to_COG. 
If 1 < M < N — 2 robots crash during the execution, then the remaining N — M 
robots will converge to the center of gravity of the crashed robots. 

Proof: Consider an execution of the gravitational algorithm by a group of N 
robots. Without loss of generality assume that the crashed robots were 1, . . . , M, 
and their crashing times were ti < ... < tM, respectively. Consider the behavior 
of the algorithm starting from time tj^. For the analysis, a setting in which the 
M robots crashed at general positions fi , . . . , tm is equivalent to one in which all 
M crashed robots are concentrated in the center of gravity G- Assume, 

without loss of generality, that this center of gravity is at 0. 

Now consider some time to > Im- Let H[to] = [a, 6] for some a < 0 < 6. By 
Corollary 1, the robots will remain in the segment [a,b] at all times t > to- The 
center of gravity calculated by any nonfaulty robot M + 1 < j < iV at time 
t > to will then be 

TV / TV 

2—1 V 

Hence, all centers of gravity calculated hereafter will be restricted to the segment 
[a', b'] where a' = -a and b' = -b. Consequently, denoting by t the time 

by which every nonfaulty robot has completed a Look-Compute-Move cycle, we 
have that H[i\ C [a' ,h'\, hence h[i\ < ■ h[to]. Again, the argument can be 

extended to any number of dimensions by considering each dimension separately. 
It follows that the robots converge to a point. | 
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Abstract. This paper introduces a new family of in-place sorting al- 
gorithms, the partition sorts. They are appealing both for their relative 
simplicity and their efficient performance. They perform 6>(n log n) op- 
erations on the average, and ©(nlog^n) operations in the worst case. 
The partition sorts are related to another family of sorting algorithms 
discovered recently by Chen [Che02]. He showed empirically that one 
version ran faster, on the average, than quicksort, and that the algorithm 
family performed ©(nlogn) comparisons in the worst case; however no 
average case analysis was obtained. 

This paper completes the analysis of Chen’s algorithm family. In partic- 
ular, a bound of nlogn-\- 0{n) comparisons and 6>(n log n) operations is 
shown for the average case, and ©(nlog^n) operations for the worst case. 
The average case analysis is somewhat unusual. It proceeds by showing 
that Chen’s sorts perform, on the average, no more comparisons than 
the partition sorts. 

Optimised versions of the partition sort and Chen’s algorithm are very 
similar in performance, and both run marginally faster than an opti- 
mised quasi-best-of-nine variant of quicksort [BM93]. They both have a 
markedly smaller variance than the quicksorts. 



1 Introduction 

We consider the problem of in-place sorting of internally stored data with no 
a priori structure. We focus on the average case performance of deterministic 
sequential sorting algorithms. By in-place, we mean algorithms which use an 
additional O(logn) space; by average case, that all input orderings are equally 
likely. 

A new family of in-place algorithms, which have a quicksort-like flavour, was 
proposed by Chen [Che02], and their worst case number of comparisons was 
shown to be &{nlogn). Empirical evidence was presented for the conjecture 
that their average case number of comparisons is close to the information theo- 
retic lower bound and the fact that some versions run faster than quicksort on 
the average (for medium sized sets of order 10^). The average case analysis of 
comparisons was posed as an open problem. Bounds on operation counts were 
not mentioned. 

’’’ This work was supported in part by NSF grant CCR0105678. 
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In this paper, we introduce a related new family of in-place algorithms, which 
we call partition sorts. The partition sorts are attractive, both because of their 
relative simplicity, and their asymptotic and practical efficiency. We show the 
following performance bounds for partition sort: 

1. The average case number of comparisons is nlogn -|- 0(n), and for one ver- 
sion, is at most nplog (n-|-l) — 1.193n-|-0(logn) comparisons^. They perform 
6*(nlog^n) comparisons in the worst case. 

2. They perform 6>(nlogn) operations on the average and 0{nlog^n) opera- 
tions in the worst case. 

As we show by means of a non-trivial argument, the average comparison cost 
of the partition sorts upper bounds that of Chen’s algorithm family; this bound is 
also used in the analysis of exchanges for Chen’s sorts. Using this and additional 
arguments, we obtain the following bounds for Chen’s family of algorithms: 

1. The number of comparisons performed by each version of Chen’s algorithm 
is no more than the number of comparisons performed by a corresponding 
version of partition sort, on the average. Hence Chen’s algorithm family 
performs nlogn -|- 0{n) comparisons on the average. We also give a simple 
and tight analysis for the worst case number of comparisons. 

2. They perform 0( nlogn) exchanges on the average and 0(nlog^n) exchanges 
in the worst case. 

The quicksort partition procedure is a central routine in both families of 
algorithms, and as a consequence, they both have excellent caching behaviour. 
Their further speedup relative to quicksort is due to a smaller expected depth 
of recursion. 

We outline the analysis of the partition sort (Section 2) and the average 
case cost of comparisons for Chen’s algorithm (Section 3). We also present some 
empirical studies of these algorithms (Section 4). 



Prior Work 

It is well known [FJ59] that, for comparison based sorts of n items, logn! = 
n log n — n log e + ^ log n + 0{1) = n log n — 1.443n -|- 0(log n) is a lower bound 
on both the average and the worst case number of comparisons, in the decision 
tree model. Merge- insertion sort [FJ59] has the best currently known worst case 
comparison count: nplog (|n) — n + O(logn) = nqlogn — 1.441n -I- O(logn), 
at the expense of 0{n^) exchanges^; it is, however, not clear how to implement 
it in 0(nlogn) operations while maintaining this comparison count; the best 
result is an in-place mergesort which uses merge-insertion sort for constant size 

^ We let plogn denote the piecewise log function [lognj -|- ^(n — 2l*°® "1 ), which is an 
approximation to the log function, matching it at n = 2*^, for integers fc > 0. 

^ qlogn [log |nj -|- log | + ^(n — |2b°®3'‘l), i.e., the approximation to the log 
function with equality roughly when n — |2*’, for integers fc > 0. 
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subproblems [Rei92]; this achieves 0(nlogn) operations and comes arbitrarily 
close to the above comparison count, at the cost of increasingly large constants 
in the operation count. The trivial lower bound of 0(n) on both the average and 
the worst case number of exchanges is met by selection sort at the expense of 
0(n^) comparisons. A different direction is to simultaneously achieve 0(nlogn) 
comparisons, 0 (n) exchanges, and 0 ( 1 ) additional space, thus meeting the lower 
bounds on all resources; this was recently attained [FG03] but its efficiency in 
practice is unclear. 

Hoare’s quicksort [Hoa61, Hoa62] has been the in-place sorting algorithm 
with the best currently known average case performance, namely 2 nlnn = 
1.386nlogn comparisons, with worst case number of exchanges 0(nlogn); it 
runs in 0 (logn) additional space with modest exchange counts, and has ex- 
cellent caching behaviour. Its worst case number of comparisons, however, is 
0(n^). These facts remain essentially the same for the many deterministic vari- 
ants of the algorithm that have been proposed (such as the best-of-three vari- 
ant [Hoa62, Sin69] which performs ^nlnn = 1.188nlogn comparisons on the 
average, and the quasi-best-of-nine variant [BM93] which empirically seems to 
perform 1.094nlogn comparisons on the average). 

Classical mergesort performs nlogn — n-l-1 comparisons and 0(nlogn) oper- 
ations in the worst case and exhibits good caching behaviour, but is not in-place, 
requiring 0(n) additional space. In-place mergesorts with 0(1) additional space 
have been achieved [Rei92, KPT96] with the same bounds but their complex 
index manipulations slow them down in practice. 

Heapsort, due to Williams and Floyd [Wil64, Flo64], is the only practi- 
cal in-place sorting algorithm known that performs 0 (nlogn) operations in 
the worst case. Bottom-up heapsort [Weg93], originally due to Floyd, performs 
nlogn-|- 0(n) comparisons on the average and |nlogn-|- 0{n) comparisons in 
the worst case, with 0(1) additional space; weak-heapsort [Dut93, EWOO] per- 
forms nlogn-|- l.ln comparisons in the worst case while using n additional bits. 
The exchange counts are also modest. Nonetheless, its average case behaviour is 
not competitive with that of quicksort; it does not exhibit substantial locality of 
reference and deteriorates due to caching effects [LL97]. 



2 The Partition Sort 



We present the partition sort as a one parameter family of algorithms, with 
parameter 7 > 1 . 



The sort of an array of n items begins with a recursive sort of the first 



n 

7 



J 



items, forming a subarray S of sorted items. 

The heart of the algorithm is a partition sort completion procedure that com- 
pletes the sort of the following type of partially sorted array. The array consists 
of two adjacent subarrays S and U: The portion to the left, S, is sorted; the re- 

def def 

mainder, U, is unsorted; s = [S'! and u = \U\. For a call (S', t/), sort completion 
proceeds in two steps: 
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Multiway Partitioning: The items of U are partitioned into s + 1 buck- 
ets, Uq, U\, ■■■, Us, defined by consecutive items of the sorted subarray 
S = ,Ss), where Si < S2 • • • < Sg- C/fc contains the items in U 

lying between Sk and (we define sq —00 and Sg+i +00). Thus the 
items of S act as pivots in a multiway partitioning of U. We describe the 
implementation of this partitioning below. 

Sorting of Buckets: Each bucket Uk, 0 < fc < s, is sorted by insertion sort. In 
order to bound the worst case operation count, if \Uk\ > clogn (where c is 
a suitable constant) Uk is sorted recursively; in this case we say Uk is large, 
otherwise we say it is small. 

The multiway partitioning may be thought of as performing, for each item in 
U, a binary search over the sorted subarray S. Nevertheless, so as to implement 
it in-place and to minimise the number of exchanges performed, we use a recur- 
sive multiway partitioning procedure, essentially due to Chen [Che02] , described 
below for a call {S,U). 

First the unsorted subarray U is partitioned about the median x of S', this 
creates the ordering Sl x Sr Ur Ur, with Sr < x < Sr and Ur < x < Ur^. Next 
the blocks {x} U Sr and Ur are swapped (as described in the next paragraph), 
preserving the order of {x} U Sr but not necessarily of Ur, yielding the ordering 
Sr Uf X Sr Ur with SrUU'j^ < X < SrU Ur. Then the subarrays {SR,Uf) and 
{Sr,Ur) are partitioned recursively. The base case for the recursion arises when 
the sorted subarray S is empty; then the unsorted subarray U forms one of the 
sought buckets. 

The partitioning of U about x may be done using any of the partitioning 
routines developed for quicksort. A simple way of swapping blocks {a;} US'/} and 
Ur is to walk through items in {x} U Sr from right to left, swapping each item 
a thus encountered with the item in a’s destination position. 

It is readily seen that each item u € U performs exactly the same comparisons 
as in a binary search. 

By solving the smaller subproblem first, and eliminating the remaining tail 
recursion, the additional space needed may be limited to 0 (logn) in the worst 
case. 

Remark 1. The case 7 = ^/n corresponds to an in-place sequential implementa- 
tion of parallel quicksort [Rei85] . 



2.1 Worst Case Analysis of Operations 

We analyse the algorithm for the case 7 = 2 in detail, focusing on the compar- 
isons initially. 

Observe that the number of comparisons occurring during the insertion sorts 
is bounded by 0(n log n). Thus, it suffices to analyse the remaining comparisons. 

a < B means that for all 6 G B, a < 6. i? < c is defined analogously. 
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Let B(s,u) be the worst case number of non-insertion sort comparisons for 
sort completions on inputs of size and let C(n) be the worst case number 

of non-insertion sort comparisons for the partition sort on inputs of size n. 

Each item u € U performs at most [log(s -I- 1)] comparisons during the 
multiway partitioning. It remains to bound the comparisons occuring during the 

sorting of buckets. If Ik \Uk\, then J2k=o 



i?(s, u) < max 
Ik 



u [log(s -k 1)1 






C{h) 



u |"log(s -I- 1)1 -I- C{u) 



where we may have overestimated since a term C{lk) is needed only if Ik > clog n. 
Clearly, C(l) = 0 and, for n > 1: 



c(„, ^ « aij . Ill) + c ([|j) , 111 h (111 . 1)1 + c (111) . c (111) 



Thus, C{n) = 0(nlog^n), since the bound is tight on an input in sorted order, 
and we have shown: 



Lemma 1. The partition sort, with 7 = 2, performs 0{nlog^n) comparisons in 
the worst case. 



For larger 7 , including the operation count, we can show: 

Theorem 1. The partition sort performs operations in the worst 

case. 



2.2 Average Case Analysis of Operations 

Again, we analyse the case 7 = 2 in detail; as before, the comparison count 
dominates the operation count, and thus it suffices to analyse comparisons. 

For simplicity, we assume initially that the algorithm sorts all buckets large 
and small using insertion sort. We will ensure, with a suitable choice of c, that 
precisely because we forsake recursive calls to partition sort, this introduces an 
inefficiency in the sorting of large buckets on the average, and hence that the 
average case bound that we will obtain is a valid (if slightly weak) bound for the 
original algorithm too. 

Let B(s,u) be the average case number of comparisons for sort completions 
on inputs of size (s,u), and C{n), the average case number of comparisons for 
the partition sort on inputs of size n. Let /(/) denote the average case number 
of comparisons for sorting a bucket of size 1. It is easy to show: 

Lemma 2. // s -I- I = 2* -|- h with 0 < h < 2®, then the partitioning for an input 
of size (s,u) performs uli+ comparisons on the average. 
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We now bound B(s,u) by overestimating the number of comparisons re- 
quired, on the average, for the sorting of buckets; with s -|- 1 = 2'‘ + h, where 
0 < h < 2*, we have: 



B{s, u) 




2h 

s -I- 1 



U 

+ E E Pr[a is in U and its bucket has size /] 
item a ^=2 



m 

I 



The probability for an arbitrary item a in S' U C/ to be in [/ and to be in 
a bucket of size I, may be overestimated as follows. For an arbitrary item a, 
the end points of a bucket of size I (which are in S) may be chosen in at most 
I distinct ways, and given such a pair of end points, the probability that they 
form a bucket (which is the same as saying that the remaining s — 2 items in S 
are chosen from outside this range) is exactly: 

s + u — I — 2\ jfs + u 
s-2 )/\ s 



unless a is among the least or greatest I items of ^ U C7; to avoid anomalies, 
it suffices to pretend the buckets at the two extremes are combined to form a 
single bucket; then the same probability applies to every item, and clearly we 
have only overestimated the cost of the algorithm. Hence: 



B{s, u) < u I i + 



2h 

s -I- 1 



{s + u)'^I{l) 



1=2 



s + u — I — 2 
s-2 



s + u 
s 



We may now obtain a bound on C(n). For n > 1, we have: 






< 



2h 



LtJ+1 



m 

E 

1^2 



/ n — i — 2\ 

(Lij) 



m 



where -|- 1 = 2* -|- h and 0 < h < 2b 
Clearly, I{1) < Q. 

Lemma 3. 






E 



1 12 ^ . 
< X H O { ^ 



( 4 ^) 

Substitutiong in the equation for C(n) yields: 



C 



aij) 



Lemma 4. If all input orderings are equally likely, the partition sort, with 7 = 2, 
performs at most nplog (n -I- 1) — n -I- O(logn) comparisons on the average. 

By bounding lil) more exactly, we can show: 

Theorem 2. If all input orderings are equally likely, the partition sort, with 
7 = 2, performs at most nplog (n -|- 1) — (1 -|- In 2) n -I- O(logn) = nplog (n -|- 
1) — 1.193n-|- O(logn) comparisons and 0(nlogn) operations on the average. 
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3 Chen’s Algorithm 

We slightly generalise Chen’s algorithm to a three parameter family of algo- 
rithms, with parameters 7 , A, and described below. Typically, 7 > 2 and a 
power of 2, A > 2 and a power of 2, A < and 7 < /r; Chen [Che02] had 
considered a one parameter family with 7 = A and ^ = A(A — 1), for arbitrary 
A > 2. 

As with the partition sorts, the sort begins with a recursive sort of the first 
items. The heart of the algorithm is provided by Chen’s sort completion 
procedure, for completing the sort of a partially sorted array (S', C/); as before, S 
is sorted and U is unsorted. This recursive procedure is similar to that used in 
the partition sort, except for proceeding differently when S is small relative to 
U, and has the following form: 

Case p{s+ 1) — 1 > (s-l-u): Thus, {fi— l)(s-|- 1) > u; we say that such an input 
has the balanced property. 

First, the unsorted subarray U is partitioned about the median x of S, 
resulting in blocks Ul and Ur-, next, the blocks and are swapped, 

preserving the order of items in {a:}US'i{, but not necessarily of those in Ul, 
yielding the ordering Sl U'^ x Sr Ur with SlCU'r < x < SrVJUr-, finally 
the subarrays {Sr, Ur) and {Sr, Ur) are sorted recursively. 

Case /r(s-l-l) — 1 < (s-l-w): Thus, (^— l)(s-|-l) < m; we call this case expansion, 
and in particular for A = 2, doubling. 

First, the sorted subarray S is expanded as follows: Let S' be the leftmost 
subarray of size |"A(s-|-l) — 1] . The subarray S' is sorted recursively, with S as 
the sorted subarray; then, the sort of the whole array continues recursively, 
with S' as the sorted subarray. 

Chen’s description [Che02] of his algorithm can be viewed as bottom-up; 
{s,u) is set to (l,n — 1) initially, which is analogous to setting 7 = A; we have 
replaced his single parameter with three. 

3.1 Average Case Analysis of Comparisons 

We prove that on average the partition sort with parameter 7 performs at least 
as many comparisons as {dominates) Chen’s algorithm with parameters (A, p, 7 ). 
We analyse the case A = 2 in detail. 

For purposes of the analysis, we introduce another sorting algorithm, which 
we call the variant partition sort (to contrast it with the original partition sort, 
henceforth qualified as basic to emphasise the distinction) . We prove two claims: 



Claim 1. The basic partition sort dominates the variant partition sort. 

Claim 2. If Claim 1 holds, then the basic partition sort dominates Chen’s algo- 
rithm. 
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It is helpful to view the basic partition sort as follows, when its input is an 
unbalanced problem instance U) with (/r — l)(s + 1) <u (where s = [S'! and 
u = |C/|); we name the s items in S type A items, the immediately following 

def 

s + 1 items in U type B items, and the remaining t = u — (s + 1) items in U 
type C items; the basic partition sort then proceeds as follows: 

PI: It places each of the u items of U in one of the s + 1 buckets delimited 
by items in S; we call these the A buckets. Each placement takes log(s + 1) 
comparisons. 

P2: It sorts each of the s + 1 A buckets by insertion sort. 

The variant partition sort, which mimics Chen’s algorithm to a certain extent, 
proceeds as follows: 

VI: It places each of the s + 1 type B items of U in one of the s + 1 A buckets 
delimited by items in S. Each placement takes log(s + 1) comparisons. 

V2: It sorts each of the s + I A buckets by insertion sort. 

V3: It places each of the t type C items of U in one of the 2s + 2 buckets 
delimited by items in A U B; we call these the AB buckets. Each placement 
takes 1 + log(s + 1) comparisons. 

V4: It sorts each of the 2s + 2 AB buckets by insertion sort. 

Lemma 5. For A = 2, Claim 2 holds. 

Proof. We proceed by induction on s + u. Clearly, for s + u = 1, the result holds. 

For s + u > 1, if (/r — l)(s + 1) > u, the initial call in Chen’s algorithm is a 
non-doubling call, and hence the same as in the partition sort; for this case the 
result follows by induction applied to the recursive calls. 

It remains to consider the unbalanced case with (/i — l)(s -I- 1) < u. Here 
the initial call in Chen’s algorithm is a doubling call. In the variant partition 
sort, the same doubling call occurs, but thereafter it performs no more doubling 
calls, i.e., in its recursive calls on subproblems (S, ft) it acts in the same way as 
the basic partition sort on input (S, u) . We consider the two top level recursive 
calls generated by the doubling call (s, s -I- 1), namely and (^^,s + 

1 — Ml), and the call after the doubling completes, (2s -I- 1, m — s — 1). For each 
value of Ml, all possible orderings of the items are equally likely, given that the 
initial (s,m) problem was distributed uniformly at random. Consequently, the 
inductive hypothesis can be applied to these three inner calls, showing that the 
basic partition sort dominates Chen’s algorithm in each of these inner calls. 
Consequently, the variant partition sort dominates Chen’s algorithm for the 
original (s, m) call. Given our assumption that the basic partition sort dominates 
the variant partition sort, this proves the inductive step. □ 

Lemma 6. For A = 2, Claim 1 holds. 

Proof. To facilitate the comparison of the basic partition sort with its variant, 
we view the execution of the basic partition sort in the following alternative way; 
this form has exactly the same comparison cost as the previous description. 
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PI': It places each of the s + 1 type B items of U in one of the s + 1 A buckets 
delimited by items in S. (Each placement takes log(s+l) comparisons.) This 
is identical to step VI. 

P2': It sorts each of the s + I A buckets (containing type B items) by insertion 
sort. This is identical to step V2. 

P3': It places each of the t type C items of U at the right end of one of the 
s + 1 A buckets in A U S (which already contain type B items in sorted 
order). (Each placement is done by a binary search over the A items, taking 
log(s + 1) comparisons.) 

P4': For each of the s + 1 A buckets, it places the type C items in their correct 
position by insertion sort (at the start of this step, each such A bucket 
contains both type B and type C items, with type B items in sorted order, 
and type C items at the right end). 



It remains to determine the relative costs of steps V3 and V4 and of steps P3' 
and P4'. Step V3 performs one more comparison for each type C item, for a total 
excess of t comparisons (this is clear for s = 2^ — 1, and needs a straightforward 
calculation otherwise). In step V4, the only comparisons are between type C 
items within the same AB bucket, whereas in step P4' there are, in addition, 
comparisons between type C items in distinct AB buckets (but in the same A 
bucket), and also comparisons between type B and type C items in the same 
A bucket, called BC comparisons. We confine our attention to BC comparisons 
and show that there are at least t such comparisons, on the average; it would 
then follow that on average the partition sort dominates its variant. 

If it were the case that one type B item (s + 1 in number) went into each of 
the s + 1 A buckets, resulting in an equipartition, it is evident that the number 
of BC comparisons would be exactly t. We now argue that this is in fact a 
minimum, when considering comparisons on the average. 

Given the AB sequence with equipartition (i.e., with one B item in each A 
bucket) consider perturbing it to a different sequence. For any such non-trivial 
perturbation, there is at least one A bucket which holds a group of r + 1 > 1 
type B items (and which contains r + 2 AB buckets). 

Consider an A bucket with multiple type B items in the perturbed sequence. 
The r excess type B items must have been drawn from (originally) singleton A 
buckets (which originally contained two AB buckets) and thus there are r empty 
A buckets (which now contain only one AB bucket). An arbitrary type C item 
could go into any of the 2s +2 AB buckets with equal probability. The number of 
BC comparisons that this type C item undergoes, on the average, increases by: 



1 

2s T 2 



-2.(r+l) + (r + 2). 



/r + 3 

V 2 




> 0 



for r > 1 as a consequence of the perturbation. 

Finally, we note that in the perturbed sequence, for each A bucket with 
r + 1 > 1 type B items, the number of BC comparisons an arbitrary type C 
item undergoes increases, on the average, by the above quantity. 



The Average Case Analysis of Partition Sorts 249 



Thus, the equipartition configuration minimises the number of BC 
comparisons, and consequently, the basic partition sort dominates its variant. 

□ 

Setting 7 = 2 to exhibit a bound, from dominance and Theorem 2 we have: 

Theorem 3. If all input orderings are equally likely, Chen’s algorithm, with 
(A, 7 ) = (2,2), performs at most nplogn — (| + ln2) n + O(logn) eomparisons 
on the average, independently of p,. 

3.2 Operation Counts 

Theorem 4. Chen’s algorithm performs 0(^logA nlogn) comparisons in the 
worst ease, independently ofj. 

Theorem 5. Chen’s algorithm performs exchanges in the worst case. 

Theorem 6. If all input orderings are equally likely, Chen’s algorithm with 
A(1 + e) < p performs 0(7 nlogn) exchanges on the average. 

4 Empirical Studies 

We implemented and compared the performance of quicksort, the partition sort 
and Chen’s algorithm. We measured the running times and counted compar- 
isons Cn and data moves M„, for various input sizes n. 

We implemented quicksort in two different ways. The first largely fol- 
lows Sedgewick [Sed78]; it uses a best-of-three strategy for selecting the pivot 
[Hoa62, Sin69], and sorts small subproblems (of size less than the insertion cu- 
tover) using insertion sort, but unlike Sedgewick, performs the insertion sorts 
as they arise (locally), rather than in one final pass (globally). Our experiments 
showed that for large input sizes, the local implementation yielded a speedup. 
The second implementation is similar except that it uses Tukey’s ‘ninther’, the 
median of the medians of three samples (each of three items), as the pivot; this 
quasi-best-of-nine version due to Bentley and Mcllroy [BM93] was observed to 
be about 3% faster than the best-of-three version. 

For the partition sort and Chen’s algorithm, the block swaps (of blocks {a;}U 
Sr and Ur) are performed using Chen’s optimised implementation [Che96]. 

Best results for the partition sort were obtained with 7 about 128. The av- 
erage size of the bucket is then quite large and we found it to be significantly 
more efficient to sort them with quicksort. Moderate variations of 7 had little 
impact on performance. 

Again, with Chen’s algorithm, we added a cutover to quicksort on moderate 
sized subproblems. ^ Our best performance arose with X = 2, p = 128, and 

Without this cutover we were unable to duplicate Chen’s experimental results 
[Che02]. He reported a 5-10% speedup compared to best-of-three quicksort on inputs 
of size up to 50,000. 
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7 = 128, and a quicksort cutover of about 500. Again, moderate variation of /r 
and 7 had little effect on the performance. 

We compared top-down and bottom-up drivers in our experiments, and found 
that top-down drivers tend to have smoother behaviour. We have therefore cho- 
sen to use top-down drivers in our implementations. 

For our algorithms, we resort to standard hand optimisations, such as elim- 
inating the tail recursions by branching (and other recursions by explicit stack 
management); we also inline the code for internal procedures. 

We ran each algorithm on a variety of problem sizes. Each algorithm, for 
each problem size, was run on the same collection of 25 randomly generated 
permutations of integers (drawn from the uniform distribution). Running times 
for the best choices of parameters are shown in the table below. 



Running Times T„ ± a { Tn ) { fis ) 



n 


Partition 


Chen 


Quasi-best-of-9 


Best-of-3 


20000 


4906 


± 


no 


4878 


± 


78 


4840 


± 


190 


4868 


± 


182 


25000 


6306 


± 


107 


6421 


± 


185 


6134 


± 


183 


6268 


± 


183 


30000 


7652 


± 


78 


7863 


± 


150 


7493 


± 


241 


7805 


± 


201 


35000 


9124 


± 


200 


9155 


± 


217 


8969 


± 


315 


9215 


± 


252 


40000 


10577 


± 


149 


10689 


± 


221 


10406 


± 


235 


10601 


± 


284 


200000 


65453 


± 


764 


67048 


± 


7861 


65294 


± 


1467 


87997 


± 


6412 


250000 


84526 


± 


1355 


84551 


± 


1269 


84110 


± 


914 


86294 


± 


2199 


300000 


103683 


± 


354 


104319 


± 


3631 


102909 


± 


1264 


105514 


± 


2124 


350000 


123258 


± 


1457 


124211 


± 


2542 


122788 


± 


1389 


127300 


± 


2900 


400000 


143115 


± 


1024 


144246 


± 


2007 


143029 


± 


1622 


147475 


± 


2328 


2000000 


858194 


± 


3038 


862604 


± 


5364 


893321 


± 


2779 


897493 


± 


6893 


2500000 


1093029 


± 


2805 


1100885 


± 


4665 


1107729 


± 


12688 


1148407 


± 


10356 


3000000 


1344032 


± 


9178 


1348160 


± 


9513 


1360892 


± 


2680 


1404965 


± 


11242 


3500000 


1583731 


± 


5295 


1596032 


± 


8990 


1602471 


± 


9326 


1663999 


± 


9192 


4000000 


1833468 


± 


4975 


1845352 


± 


9949 


1854723 


± 


12678 


1921886 


± 


2685 



In our experiments, even for trials repeated only a few times, the deviations 
are orders of magnitude smaller than the average. This is most striking for the 
operation counts (not shown) where, with trials repeated 25 times, the deviations 
are typically below 0.1% of the averages for the new algorithms, and below 2% 
for the quicksorts. 

The experiments were conducted on a Sun UltraSPARC-IIe processor with 
CPU speed 550 MHz and Cache size 512 KB. The programs, written in C, were 
run under Sun OS (kernel version Generic_112233-08) with gcc (version 3.3.3) 
compilation at -03 level. Broadly similar results were obtained on a Pentium 
running GNU/Linux. 
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Abstract. We present a distributed approximation algorithm that com- 
putes in every graph G a matching M of size at least ^ 13(G), where 
/3(G) is the size of a maximum matching in G. The algorithm runs in 
0(log'^ |U(G)|) rounds in the synchronous, message passing model of 
computation and matches the best known asymptotic complexity for 
computing a maximal matching in the same protocol. This improves the 
running time of an algorithm proposed recently by the authors in [2]. 



1 Introduction 

Any set of pairwise disjoint edges constitutes a matching of a given graph. 
From the algorithmic point of view two basic problems arise in the context of 
matchings. One is to compute any matching of maximum cardinality (maximum 
matching), the second one is to construct a matching not contained in any other 
matching of a given graph (inclusion maximal matching). There is rich litera- 
ture concerning sequential algorithms for the maximum matching problem (see 
Lovasz and Plummer [5] for an excellent survey). Several sequential algorithms 
for general graphs which are polynomial in time are known and many of them 
use augmenting paths as a main tool. It has turned out, however, that even 
the very simple sequential greedy algorithm is not known to be implemented in 
parallel. Many problems become even more difficult when the parallel network 
is not equipped with the shared memory storage. The increasing importance of 
large-scale networks such as Internet, ad-hoc and sensor networks have moti- 
vated active research on distributed computing in a message passing systems. 
Adopting the existing sequential procedures results in a rather very inefficient 
scheme and a new algorithmic approach is needed. 

We consider the distributed model of computation introduced by Linial in 
[6]. In this model a network is represented by an undirected graph where each 

* Research supported by KBN grant no. 7 TllC 032 20 
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vertex stands for a processor and each edge corresponds to a connection between 
two processors in the network. Each processor has a unique ID and knows the 
number of vertices in the graph but not the global topology of the network. We 
assume full synchronization of the network: computation is performed in steps. 
In a single step, each processor can send a message to all of its neighbors, col- 
lect messages from its neighbors, and perform some local computations. Note 
that the above model is completely different from the parallel PRAM model 
where each computational unit can communicate with all other units at each 
step of the computations. As a result many problems that admit efficient algo- 
rithms in the PRAM model are still open in the distributed model. An eminent 
example, which is a generalization of the problem studied in this work, is the 
Maximal Independent Set (MIS) Problem. Efficient MC algorithms for MIS are 
well-known ([7]) but the question if there is an efficient distributed algorithm 
for the problem is one of the main open problems in the area ([6]). However, an 
efficient distributed algorithm for the related Maximal Matching (MM) Problem 
is known due to Hahckowiak, Karohski, and Panconesi. In [4] the three authors 
proved the following theorem. 

Theorem 1 ([4]). There is a distributed procedure Match which in O(log^n) 
many communication rounds in the synchronous, distributed model of computa- 
tion, computes a maximal matching M in any graph G on n vertices. 

The procedure Match is a starting point for further research in [2] and in 
this paper. In [2], building on ideas from [4], the authors designed an algorithm 
which finds a maximal matching of size which is relatively close to the maximum: 

Theorem 2 ([2]). There is a distributed algorithm which in 0(log®n) steps 
finds a matching M in any graph G on n vertices, such that 

\M\ > \l3{G) 

where (3{G) denotes the size of the largest matching in G. 

In this paper, we propose a substantial modification of the algorithm from Theo- 
rem 2 which runs in 0(log^ n) steps and therefore has the same time complexity 
as the algorithm Match in Theorem 1. The following theorem is proved: 

Theorem 3. There is a distributed procedure Matching which in O(log^n) 
steps finds in any graph G{V,E) with n = |P(G)| vertices a matching M, such 
that 

\M\ > \m)- 

Although the approximation ratio of our result may not be very appealing, it 
should be remembered that the problem of computing an optimal solution in 
a general graph is even not known to be in MC. And yet the approximation 
algorithm given by [3] is inherently designed for the EREW PRAM. 
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Interestingly, the result is purely deterministic and together with [4] stands 
among few examples of distributed computing, where polylogarithmic time com- 
plexity is achieved without the use of random bits. The 0(log® n) complexity of 
the algorithm from Theorem 2 comes from sequential iterations over O(log^n) 
blocks (defined in the next section). It is this part of the algorithm which we 
improve. The modification that we present allows us to perform computations 
in parallel and as a result we drop the 0(log^ n) factor in the time complexity 
altogether. 

Following the strategy from [2] , we compute a maximal matching M and then 
a maximal set of vertex-disjoint paths of length three that augment M. After 
augmenting the matching M along these paths we obtain a new matching M' 
such that |M'| > |/3(G) . To find a maximal matching we invoke the procedure 
Match from Theorem 1 and to compute a maximal set of disjoint paths we 
design a new procedure, which is the main contribution of this paper. 

In the next section, we present definitions and notation which are necessary to 
develop the algorithm together with some preliminary lemmas. The last section 
contains the description of the algorithm and a sketch of the proof of its correct- 
ness. For complete proofs the reader is referred to the full version of this paper. 

2 Definitions, Notation, and Preliminary Lemmas 

In this section we fix some notation, introduce terminology, and state a few useful 
lemmas. As our algorithm creates an auxiliary multigraph, we define most of the 
terms in the multigraph setting. 

Let M be a matching in a (multi)graph G = (V,E). We say that a vertex 
V is M -saturated if v is an endpoint of some edge from M. An edge e = {m,t} 
is M -saturated if either u or v is M-saturated. A path P is M -alternating if it 
contains alternately edges from M and from E \ M . A path P of length 2k -\- 1, 
fc > 0, augments M if P is M-alternating and no end of P is M-saturated. A 
special role in the paper will be played by paths of length three augmenting M. 
A path is called an {M, 3) -path if it augments M and has length three. A set of 
paths is called independent if every two paths from the set are vertex-disjoint. 

Just like in [2], our algorithm is based on the following facts that are standard 
in graph theory. 

Claim 4 

(a) Let M be an arbitrary matching in G. If there are no paths of length at most 
three augmenting M, then 

\M\ > \(3{G). 

(b) Let M be a maximal matching in a graph G and let V be a maximal in- 
dependent set of paths of length three augmenting M. Further, let M' = 
M ^p^-p E{P)) be the matching obtained by augmenting M along the 
paths from P . Then there are no paths of length at most three augmenting 
M' in G. 
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Consequently, parts (a) and (b) of Claim 4 together assure, that the matching 
M' satisfies 

\M'\ > \m)- 

In the process of constructing the matching we will use the notion of a substantial 
matching. 

Definition 1. Let M be a matching in a multigraph G = (V, E) and 7 > 0. A 
matching M is "/-substantial in G if the number of M -saturated edges of G is at 
least 7 |if| . 

Using an algorithm from [4] (with minor changes), we can find a substantial 
matching in a bipartite multigraph. 

Lemma 1. For all constants C > 0 and 0 < 7 < 1 there exists a distributed 
algorithm that finds in 0 (log^ n) steps a "/-substantial matching in a bipartite 
multigraph G = (L,R,F), where |U| < and \L\ + \R\ = n. 

Similarly, we will also use the notion of a substantial set of paths. Let Vs{M) 
denote the set of all (M, 3)-paths in a graph G = (U, E). 

Definition 2. Let M be a matching in G and 7 > 0. A set V of {M,3)-paths 
is called "/-path-substantial in G if the number of paths from V^^M) that have a 
common vertex with some path in V is at least "/\V 3 {M)\. 

Now we will introduce more technical terminology. Given a bipartite multi- 
graph H = (L, R, E) we partition its left-hand side L into sets Li = {u G L : 
^ < degniu) < Di} for Di = 2* and f > 0. The submultigraph of H induced by 
the edges incident to Li, denoted by Hi = {Li, N{Li), Ei) is called a Di-block. 
A key concept which will be used in our approach is the concept of a spanner. 

Definition 3. Let H = (A, B, E) be a D-block for some D as defined above. An 
{a, K)- spanner from A to B is a subgraph S = {A',B,E') of H such that the 
following conditions are satisfied. 

1. \A'\ > a\A\. 

2. For every vertex a G A', degs(a) = 1 . 

3. For every vertex b G B, degs{b) < ^degnib) -\- 1 . 

In other words, a spanner is a collection of stars such that the degrees of the 
centers of the stars are appropriately bounded in terms of their degrees in H. 

Note that spanners played an important role in designing the algorithm for 
maximal matching in [4]. Although the procedures in [4] are formulated for 
simple graphs they can be adopted to multigraphs with only minor changes. In 
particular, we have the following fact. 

Lemma 2 ([4]). For every constant G and input parameter D there exist a 
constant K = K{G) and a distributed algorithm that finds in O(log^n) steps a 
{^, IL) -spanner in a multigraph H = {A,B,E) that is a D-block with \E\ < 
and n = |A| -I- \B\. 
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Specifically, in our case C = 4 and K < 100. 

In order to motivate the following definitions, we spend some time on ex- 
plaining the main idea of our algorithm. 

The approach used to find a set of augmenting paths is based on the following 
strategy. First we find a maximal matching M using the procedure Match given 
in Theorem 1. Then the maximal independent set of (M, 3)-paths is gradually 
constructed in rounds. In each round a substantial set of (M, 3)-paths is found. 
It turns out that we can find a substantial set of (M, 3)-paths in a special layered 
graph. For that reason, from the input graph, we obtain a virtual auxiliary graph 
which is of the form. Finally, once the paths are found in the layered graph, we 
translate them back to the original graph. 

The layered graph will be a simple graph which has four layers of vertices and 
we will refer to it as a 4L-graph. A precise construction of a 4L-graph is given 
in procedure Reduce placed in the next section and here we just introduce its 
structural features. Every 4L-graph H = H{G,M), with respect to G and a 
maximal matching M, has the vertex set V{H) partitioned into four nonempty 
sets Xi, X2, X3, X4 so that H[X2,X3] = M and every vertex from Xi is 
adjacent only to vertices from X2 and every vertex from X4 is adjacent only to 
vertices from X^. In other words, H consists of 3 bipartite graphs. The edges of 
H[Xi,X2] and H[X3, X4] are potential ingredients of the (M, 3)-paths extending 
M. The desired structure of a 4L-graph is obtained by first removing from E{G) 
all edges which violate the above properties. We will distinguish in the input 
graph G three types of edges with respect to M : 

(a) edges that are in M, 

(b) edges that have exactly one endpoint in M (these edges may form triangles 

based on an edge from M), 

(c) edges that are not in M but have both endpoints in M. 

Note that edges from (c) do not belong to any (M, 3)-path of G and therefore 
we can delete them. Next, the triangles built of edges of type (b) are carefully 
destroyed (see procedure Reduce). Finally, some vertices from V{G) \ V{M) 
are replicated to have their counterparts in X4 and Xi, respectively. 

Let H = {Xi, X2, X^, X4, E) be a 4L-graph. Now we introduce an auxiliary 
multigraph Mul{E[) = {Xi, X2, X3, X4, F) defined as follows. 

Definition 4. Let El = {Xi, X2, X^, X4, E) be a 4L-graph and for every e € 
H[Xi, X2]UH[X3, X4], let We denote the number of {M, S)-paths in H eontaining 
e. The multigraph Mul{El) = (Xi, X2, X^^ X4, F) is a graph obtained from El by 
putting We copies of e in F for every edge e € H[Xi,X2] U H[X3, X4], and one 
copy of e for every edge e € H[X2-,X^] in F. 

The main property of Mul{H) is that edges in Mul{H) [Xi, ^2] and similarly, 
in Mul{H)[X^, X4] correspond to (M, 3)— paths in H . In the course of our proof 
we will make use of it by finding a substantial matching in Mu/(iL)[Xi, X2] and 
deducing a substantial property for a set of (M, 3)— paths built on the matching. 
Directly from Definition 4 we can derive the following fact. 
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Fact 5. For every e = {a, a'} € H[X2,X3], degMui(H){a) = degMui(H){a') 
which is equal to the number of {M, 3 ) -paths containing e. Moreover, there is a 
one-to-one correspondence between the edges in Mul{F[)[Xi, X2] and the (M, 3)- 
paths in FI. 

To this end, in Mul{Fl), we define a 4L-D-block. The actual process of con- 
structing the set of (M, 3)— paths will be performed in blocks that will partition 
the sets X2 and X3. 

Definition 5. Let R be the set of all edges e = {a, a'} € Mul{Fl)[X2, X3] that 
satisfy 

— < degMui(H){a) = degMui(H){a) < D 

and let X'2 = (IJ^g^ e)nAl2, X'3 = 6)0X3. A ^L-sub-multigraph Bd{F[) = 

{Xi, X'2, X3, X4, F') of Mul{H) = (Xi, X2, X3, X4, F) which contains all edges 
incident to edges from R is called a 4 L-D-block. 

It is worth noticing here, that considering D = D{i) = 2 ^ for i = 0, 1 , . . . ,logn, 
4L-blocks are edge disjoint and partition the sets X2 and X3. What is 

more, every subgraph Bjj(^i'){H)[X3, X4] is a D(i)— block. 

3 Algorithm 

In this section we present the main algorithm. In each iteration it constructs a 
substantial set of (M, 3)-paths and adds them to the global set of independent 
(M, 3)-paths. Then the paths are removed from the graph together with all edges 
incident to them. It stops when no such paths are left in the graph and outputs 
a new matching M* . Below we present an outline of the algorithm to give some 
intuition behind our approach. 



Procedure Matching 

Input: Graph G 
Output: Matching M* 

1. Find a maximal matching M in G. 

2. For j := I to O(logn) do 

a) Construct a4L— graphG= G(G, M) = (Xi, X2, X3, X4, if) fromGandM. 

b) Construct a multigraph G = Mul{G) = (Xi, X2, X3, X4, F) with multi- 
ple edges in G[Xi,X2] and G[X3,X4]. 

c) Find a ^—path-substantial set V of disjoint (M, 3)-paths in G using 
spanners iir^ blocks in parallel. 

d) Translate V found in G to P in G. 

e) Accumulate the (M,3)-paths from V. 

f) Update G (remove from G the paths in P with all edges incident to them). 

3. Update M* by augmenting M along the paths from V. 
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The algorithm consists of a procedure that computes a substantial matching, 
and two procedures from [2]. First we describe a new procedure which computes 
a substantial set of (M, 3)-paths in a given 4L-graph. Then we recall two proce- 
dures from [2]; the first one reduces a graph G = (V^ E) and a maximal matching 
M in G to a 4L-graph G = {Xx, X 2 , X^, E), the second one translates a 5- 

substantial set V of independent (M, 3)— paths in G to a-substantial set V of 
independent (M, 3)— paths in G. 

3.1 Algorithm in a 4L-Graph 

In this section, we present the main part of our algorithm. Procedure Paths- 
In4LGraph. This procedure finds a .^-path-substantial set V of disjoint (M, 3)- 
paths in a layered graph. We are building the paths in two phases. In the first 
phase we choose candidates for edges in G[X^, X 4 ] which will augment the match- 
ing M. This is performed by first constructing a star forest with all stars having 
centers in X 4 . After the second phase at most one edge from every star is added 
to the set of (M, 3)-paths. Therefore we move along the star rays to the set X 2 
and introduce an auxiliary, bipartite graph with vertices corresponding to the 
stars we just created, on one side and Xi on the other. In the second phase we 
move our attention to this graph and find a substantial matching in it. After 
extending uniquely the matching to the graph G[A3, A4] we obtain the set V of 
disjoint (M, 3)— paths. 

Phase A: Steps 1-5 construct a star forest S' in H[X^, X 4 ]. 

Phase B: Steps 6-8 construct the set P using S'. 

Note that Phase B is analogous to the main algorithm for the 4L-graph in [2] . 
However, since the blocks in PathsIn4LGraph are defined in a new way com- 
pared with [2], the analysis is different. The main problem in PathsIn4LGraph 
is to design the star forest S' so that the stars can be glued together and used 
as super- vertices in Phase B. Properties of S' which we need to prove that the 
algorithm is indeed correct are quite delicate, on one hand we must delete some 
edges of a union of spanners S to make the analysis work, on the other we need 
to keep most of them to retain the substantial property. In particular, if a star 
has exactly one vertex in a block then it must have no vertices in any other 
block. This is taken care of in steps (4) and (5) of the procedure given below. 



Procedure PathsIn4LGraph 

Input: 4L-graph G = {Xi, X 2 , X 3 , X 4 , E), where G[A2, A3] = M. 

Output: A set P of disjoint (M, 3)-paths which is ^-path-substantial in G for 
some constant ^ > 0. 

1. Construct the 4L-multigraph G := Mul{G) as in Definition 4. 

2. For z = 0 to 41ogn do D{i) := 2® 

B, := Bd(,){G) = (Ai,A2(z),A3(z),A4,F;.), 

where Xj{i) is a subset of Xj corresponding to the block Bi, for j = 1,2. 
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3 . In parallel, for every v. Let AT = AT( 4 ) be a constant in Lemma 2 . Find a 
(^, AT)-spanner Si from X3{i) to X4 in Bi[X3{i),X4] (using the procedure 
from [ 4 ]). 

4 log n 

4. Let S = (J 5 'i be a star forest. 

i=0 

a) For any star Q € S let Qi ■= Q D Si. 

b) For star Q in S' let Iq := {i : \Qi\ = 1 } (i.e. Qi is an edge) and let 

c) Let iq be the number of edges of G[X3, X4] that are incident to vertices 
from Uie/g ^{Qi) bl X3. Let jq be the of edges of G[X3, AI4] that are 
incident to vertices from Uiejg ^{Qi) bl X3. 

i^Jq 

5 . In parallel, for every star Q G S: 

a) If iq > jq then delete from Q all the edges that belong to Uiejg Qi- 
addition, select k G Iq such that k = max/g and delete from Q all stars 
Qi for which i k. As a. result Q is the edge Qk- 

b) If iq < jq delete from Q all the edges that belong to Uie/g Qi- 
Let S' denote the star forest after these operations. 

6. Construct the following auxiliary multi-graph G = {Xi, X2, E) with X2 = 
{Nx2(Q') ■ Q' G S'} and the set of edges E = {{m, v} G G : u G Xi and v G 
Nx^iQ') for some Q' G S'}. 

7 . Find a 1 / 2 -substantial matching M' in G. 

8. Extend (in a unique way) every edge of M' to a path P of length three in 
the graph using an edge of matching M and an edge of a star in S'. 7 ^ is the 
set of all paths P obtained from M' . 



We will need some more notation in the next two lemmas. Recall that S is 
a set of stars with centers in X4. Let F4 be the set of vertices of stars from S 
which are in X4, P^ the set of vertices of stars in S which are in X^. In addition 
let F2 := N{E3) r\ X2. Similarly, by considering S', we define and F^. 

The following lemma estimates the number of edges that we may lose while 
concentrating on the stars only. 

Lemma 3 . 1 . e~{F^,X4) > Xe~{X3,X4) 

2. e~{X4,F^) > Xe~(XuX2). 

Proof (Sketch). By Fact 5 , the proof of both parts is the same, and it follows 
from a property of the spanners Si (see Part 1 . of Definition 3 ). 

Lemma 4 . Let K = K{ 4 ) be given in Lemma 2 and let f 

paths P found by PathsIn4LGraph is f- path- substantial in G. The algorithm 

PathsIn4LGraph runs in 0(log^P(G)) steps. 
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Proof (Sketch). First note that the time complexity comes from finding spanners 
in all the blocks (in parallel) in step (3) and by Lemma 2 is O(log^n). Let P 
be the set of (M, 3)— paths found by the procedure PathsIn4LGraph. In view 
of Fact 5, to prove that V is ^-path-substantial, we need to show that either ^- 
fraction of edges in G[Xi, X 2 ] is incident to edges of paths from V or ^-fraction 
of edges in G[X^,X 4 \ is incident to edges of paths from V. For that we consider 
a matching M' in G found in Step (7) and the set W of vertices in X 2 that are 
in M'-saturated supervertices and are not saturated by M' themselves. Then we 
consider two cases based on the number of edges incident to W with respect to 
eg(Xi, X 2 ). If it relatively small then the fact that M' is ^-substantial combined 
with Lemma 3 implies that f The second case is a bit more complicated. 

We move our attention to the layers X 3 and X4. Incorporating the properties 
of the star forest S' and the upper bound on the degree of spanners, we show 
that at least ^iF-fraction of edges in E~(X^, X4) is incident to centers of stars 
saturated by paths from V. The two cases yield the Lemma with f = i^K. 



3.2 Reduction, Translation, and Modification 

As indicated in previous sections our main algorithm can be summarized as 
follows. First compute a maximal matching M, next iterate O(logn) times the 
following steps: (1) Reduce graph G and matching PI to a, layered graph G, 
(2) Find a set P of disjoint (M, 3)-paths in G using PathsIn4LGraph, (3) 
Translate paths from P to paths P in G maintaining the substantial property, 
(4) Modify G and G with respect P. In this section, we present procedures: 

1. Reduce which reduces G and M to a 4L-graph G; 

2. Translate which translates P to a set of paths P in G; 

3. Modify which modifies G with respect to P. 

Procedures Reduce and Modify are similar to procedures in [2], Translate 
differs from the corresponding procedure in [ 2 ] only by small details. 



Procedure Reduce 

Input: Graph G and a maximal matching PI in G. 

Output: 4L-graph G. 

1. For e S P(G) (in parallel) check if e is contained in at least one (M, 3)-path. 
If it is not then delete e. In particular, all edges from P(G) \ PI which have 
both endpoints in PI are deleted. 

2. For every edge m = {mi, m 2 } € M, with ID{mi) < /P(m 2 ), let Tm be the 
set of vertices in V such that every vertex from is adjacent to mi and 
m 2 . Partition T„j into two groups T^ i and Tm ,2 so that ||Tmp| — |Tm, 2 || < 1- 
For every vertex t S delete the edge |t,m 2 } from the graph, for every 
vertex t € Tm ,2 delete {t,mij. As a result, the ’’new” graph G' does not 
have triangles based on edges from M. From now on we will operate on G'. 
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3. For every edge m = {mi, m2} € M, with ID{mi) < ID{m2), if e € E{G') 
and e = {v, mi} for some v ^ m2 then orient e from v to mi, that is delete 
e and add arc (v,mi). If e € E{G') and e = {v,m2} for some v yf mi then 
orient e from m 2 to v, that is delete e and add arc (m2,v). 

4. For every vertex v G G \ V{M), split v into to two siblings v~ and z;’*', 
where v~ inherits all the arcs that start in v, inherits arcs that end in v. 
Finally, ignore the orientation on the edges. As a result we obtain a 4L-graph 
G = {V~ ,Vi{M),V2{M),V+), where V~ = {v~,v € V{G)\V{M)}, y+ = 
{v'^,v G V{G) \ V{M)}, and Vi{M), V2{M) are corresponding endpoints of 
M that is if m = {mi, m 2 } G M and ID{mi) < ID{m2) then mi G Vi{M), 
m2 G V2{M). 



The above procedure obtains G by first deleting all triangles which are based 
on edges from M and then using orientation to split M-unsaturated vertices into 
V~ and V'^ (see [2] for details). There are two main properties of Reduce that 
are useful for our analysis. 

Lemma 5 . Let m3 denote the number of (M,S) -paths in G, mg the number of 
{M,S)-paths in G' , and mg denote the number of {M, 5) -paths in G. Then 

1 . m'g = m 3 , 

2. m^/4 < mg. 

Fact 6. If vfm\m2V2 is an {M,3)-path in G then vim\m2V2 is {M,3)-path in 
G. 



Our next procedure modifies G with respect to the set of (M, 3)-paths V. 
The procedure deletes all edges which are incident to paths from V. As a result, 
it is possible that some edges present in the modified graph G' will not belong 
to any (M, 3)-path of G' . These edges are subsequently deleted in step (1) of 
Reduce. 



Procedure Modify 

Input: Graph G and a set V of (M, 3)- paths in G. 

Output: Modified G'. 

1. For any edge e G Upgp parallel: delete edges that are incident to e 

2. Delete all edges from [Jpgp 



Finally, in the last procedure of this section, we show how to translate a set 
of paths P in G to a set of paths P in G. Recall that G' denotes the subgraph 
of G obtained by destroying triangles that contain edges from M, in step (2) of 
Reduce. 



Procedure Translate 

Input: Set V of independent (M, 3)-paths in G. 
Output: Set P of independent (M, 3)-paths in G. 
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1. Identify vertices v~ and that correspond to one vertex v € V{G). As a 
result we obtain from V cycles and paths in graph G. These paths and cycles 
have lengths of the form 3* where * > 1 and are built up from (M, 3)-paths. 

2. Treat (M, 3)-paths as edges between endpoints of these paths. Now the prob- 

lem of selecting a set of independent (M, 3)-paths in G is reduced to the one 
of finding a substantial set of independent edges. Invoke an algorithm from 
[4] to obtain set P of independent (in G) (M, 3)-paths. The set P has the 
following property: If the number of (M, 3)-paths in G' which are incident 
to edges of IJpeP least am'^ then at least amg/d, (M, 3)-paths in 

G' are incident to edges of Upep ^{P)- 



Note that the resulting set P is a set of (M, 3)-paths which are independent 
in G. In addition the set is path-substantial with a constant specified in the next 
lemma. 

Lemma 6. If a set of paths P is f -path-substantial in G then the set of paths 
P obtained by proeedure Translate is 16 -path- substantial in G. 

Proof. The proof follows from Lemma 5 and Step (2) of Translate. 



3.3 Main Procedure 

Now we can summarize the discussion in our main algorithm. We denote by 
B the symmetric difference between A and B. 



Procedure Matching 

Input: Graph G. 

Output: Matching M* such that \M*\ > |/3(G). 

1. Find a maximal matching M using the procedure Match from [4]. 

2. Let Pi := 0 and M* := M. 

3. Iterate O(logn) times: 

a) Invoke Reduce to obtain a 4L-graph G. 

b) Invoke PathsIn4LGraph to obtain P. 

c) Use Translate to obtain P. 

d) Use Modify to modify G with respect to P. 

e) Pi := Pi u P. 

4. M* :=M*^{\Jpev,EiP))- 



Proof (of Theorem 3). First we establish the time complexity. We use O(log^n) 
rounds in the first step, to find M using the algorithm from [4]. In each iteration, 
the main time complexity comes from PathsIn4LGraph which, by Lemma 4 
is 0(log^U(G)) = O(log^n). Since we have O(logn) iterations, the number of 
steps is 0(log"^ n). 
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By Lemma 4, the set of paths V found in each iteration is ^gg^^ -path- 
substantial in G. Therefore, by Lemma 6, V is -path-substantial in G 
and so in step (3d) a constant fraction of (M, 3)-paths will be deleted from G. 
Since the number of (M, 3)-paths in G is O(n^), there will be no (M, 3)-paths in 
G after O(logn) iterations. Consequently, after step (3), Vi is a maximal set of 
(M, 3)-paths. Thus, after exchanging edges in step (4), we obtain matching M* 
such that there is no (M*,3)-path in G. Therefore, by Claim 4, \M*\ > |/3(G). 

Final Remarks. An efficient randomized algorithm for approximating the maxi- 
mum matching can be obtained by invoking a randomized MIS procedure ([7]) 
in an auxiliary graph where the vertex set consists of augmenting paths and two 
vertices are connected if the corresponding paths share a vertex in the original 
graph. 

The augmenting paths technique can only be applied to an unweighted ver- 
sion of the problem; to approximate a maximum weighted matching a completely 
different approach must be taken. 
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Abstract. Given a point set P = {pi , . . . , } in the d-dimensional unit hyper- 

cube, we give upper bounds on the maximal expected number of extreme points 
when each point pi is perturbed by small random noise chosen independently for 
each point from the same noise distribution A. Our results are parametrized by the 
variance of the noise distribution. For large variance we essentially consider the 
average case for distribution A while for variance 0 we consider the worst case. 
Hence our results give upper bounds on the number of extreme points where our 
input distributions range from average case to worst case. 

Our main contribution is a rather general lemma that can be used to obtain upper 
bounds on the expected number of extreme points for a large class of noise 
distributions. We then apply this lemma to obtain explicit bounds for random 
noise coming from the Gaussian normal distribution of variance and the 
uniform distribution in a hypercube of side length e. For these noise distributions 
we show upper bounds of 0((^f ■ log®/^ '*-! n) and 
respectively. Besides its theoretical motivation our model is also motivated by 
the observation that in many applications of convex hull algorithms the input 
data is inherently noisy, e.g. when the data comes from physical measurement or 
imprecise arithmetic is used. 

Keywords: Convex Hull, Average Complexity, Smoothed Analysis 



1 Introduction 



The convex hull of a point set in the d-dimensional Euclidean space is one of the funda- 
mental combinatorial structures in computational geometry. Many of its properties have 
been studied extensively in the last decades. In this paper we are interested in the number 
of vertices of the convex hull of a random point set, sometimes referred to as the number 
of extreme points of the point set. It is known since more than 40 years that the number 
of extreme points of a point set drawn uniformly at random from the (unit) hypercube 
is Cl(log'^~^(n)). The number of extreme points has also been studied for many other 
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input distributions, e.g. for the Gaussian normal distribution. Since the expectation is 
the average over all inputs weighted by their probabilities one also speaks of average 
case number of extreme points (and average case complexity, in general). 

Research in the average case complexity of structures and algorithms is motivated 
by the assumption that a typical input in an application behaves like an input from the 
distribution considered for the average case analysis. One weakness of average case 
analysis is that in many applications the inputs have special properties. Compared to 
this, the distributions typically considered in the context of average case analysis often 
provide a too optimistic complexity measure. 

While average case analysis may be too optimistic we also know that worst case 
complexity can be too pessimistic. For example, the simplex algorithm performs ex- 
traordinary well in practice but for many variants it is known that their worst case 
running time can be exponentially. 

Since the typical complexity of a structure or algorithm lies usually somewhere 
between its average case complexity and its worst case complexity, there is need for a 
complexity measure that bridges the gap between average case and worst case. A hybrid 
between average case and worst case analysis called smoothed analysis provides such a 
bridge. The idea of smoothed analysis is to weaken the worst case complexity by adding 
small random noise to any input instance. The smoothed complexity of a problem is 
then the worst case expected complexity, where the expectation is with respect to the 
random noise. 

In this paper, the input is a point set and we consider the number of extreme points. 
We add to each point a small random vector from a certain noise distribution (typically, 
Gaussian noise or noise drawn uniformly from a small hypercube). If the noise distri- 
bution is fixed then any point set becomes a distribution of points and the smoothed 
number of extreme points (for this noise distribution) is the maximum expected number 
of extreme points over all distributions. The noise is parametrized by its variance and so 
smoothed analysis provides an analysis for a wide variety of distributions ranging from 
the average case (for large variance) to the worst case (variance 0). 

Besides its appealing theoretical motivation the use of smoothed analysis is particu- 
larly well motivated in the context of computational geometry. In many applications of 
computational geometry and convex hull computation the input data comes from phys- 
ical measurements and has thus a certain error because measuring devices have limited 
accuracy. A standard assumption in physics is that this error is distributed according to 
the Gaussian normal distribution. Thus we can use smoothed analysis with Gaussian 
error to model inputs coming from physical measurements. Besides the assumption that 
physical measurements are not precise and that the error is distributed according to the 
Gaussian normal distribution smoothed analysis provides us a worst case analysis for 
this class of inputs. 

Another interesting motivation for smoothed analysis in the context of computational 
geometry is that sometimes fixed precision arithmetic is used during computations. We 
consider the case that the input points are computed by fixed precision arithmetic and 
we model this scenario by the assumption that every point is distributed uniformly in a 
hypercube around its ’real’ position. 
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1.1 Our Contribution 

In this paper we consider the smoothed number of extreme points for random noise from 
a large class of noise distributions. We consider an arbitrary input point set V in the unit 
hypercube. To each point random noise from a noise distribution A is added. We assume 
that A is the c?-fold product of a one dimensional distribution Ai with mean 0. This 
assumption is satisfied for many natural distributions including the uniform distribution 
in a hypercube or the Gaussian normal distribution. 

Our main technical contribution is a lemma that provides an upper bound on the 
number of extreme points for any noise distribution satisfying the above conditions. 
The bound itself depends on the variation of the local density of the distribution and 
two parameters that must be optimized by hand. We apply this lemma to noise from the 
Gaussian normal distribution and from a cube with side length e. We obtain the following 
results (where we consider the dimension d to be a constant): 

- If Zi is the d-fold product of a one dimensional standard Gaussian normal distribution 
of variance we show that the smoothed number of extreme points is 



- If Zl is a uniform distribution from a hypercube of side length e we prove that the 
smoothed number of extreme points is 



1.2 Related Work 

Several authors have treated the structure of the convex hull of n random points. In 
1963/64, Renyi and Sulanke [8,9] were the first to present the mean number of extreme 
points in the planar case for different stochastic models. They showed for n points 
uniformly chosen from a convex polygone with r vertices a bound of 0{r ■ log n). This 
work was continued by Efron [4], Raynaud [6,7], and Carnal [2] and extended to higher 
dimensions. For further information we refer to the excellent book by Santalo [10]. 

In 1978, Bentley, Kung, Schkolnick, and Thompson [1] showed that the expected 
number of extreme points of n i.i.d. random points in d space is 0(ln'^“^ n) for fixed d 
for some general probability distributions. Har-Peled [5] gave a different proof of this 
result. Both proofs are based on the computation of the expected number of maximal 
points (cf. Section 2). This is also related to computing the number of points on a Pareto 
front of a point set. 

The concept of smoothed analysis was introduced in 2001 by Spielman and Teng 
[11]. In 2003 this concept was applied to the number of changes to the combinatorial 
describtion of the smallest enclosing bounding box of a moving point set [3] where 
different probability distributions for the random noise were considered. Although the 
two dimensional case of this paper’s problem is covered in the bounding box paper [3], 
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the result was achieved by reduction to so-called left-to-right maxima in a sequence 
of numbers. A similar reduction applied in higher dimensions could only reduce the d- 
dimensional probelm to a (different) {d— 1 ) -dimensional problem. Hence new techniques 
are necessary to attack the problem in higher dimensions. 



2 Preliminaries 

For a point set V := {pi, . . . ,Pn} in let V(T^) denote the number of extreme points 
(i.e. the number of vertices of the convex hull) of V. Further let A denote a probability 
distribution over the Throughout the paper we only consider probability distributions 
A that are the c?-fold product of a one dimensional probability distribution Ai with mean 
0 (e.g. uniform distribution in the unit hypercube, Gaussian normal distribution, etc.). 
Let p : K — >■ K denote the probability density function of Ai and let := 

Pr < ^(®)] = p{x) dx denote its probability distribution function. Then the 
d-dimensional probability distribution of A is given by 

,(i) 

Pr = / ■■■/ • • • ip(3;*'‘*^) 

^ J —CO J — oo 

For a given point set V = {pi,...,p„} in the unit d-hypercube we denote by 
V{A) = {pi , ,pn} the distribution of points obtained by adding random noise from 
distribution A to the points in V. The distribution of point pi is given by Pi = Pi + Vi, 
where Vi is a random vector chosen (independently of the other r^ ’s) from A. The prob- 
ability distribution of V(A) is then given by combining the distributions of the pi. We 
will parametrize A by its variance. Thus our results depend on the ratio between variance 
and diameter of the point set. When the noise distribution is clear from the context we 
will also use V to abbreviate V{A). 

We define the smoothed number V{V{A)) of extreme points to be the maximal 
expected number of extreme points in a perturbed point set over all sets V in the unit 
hypercube, i.e. maxp^jQ E[number of extreme points in V{A)]. 

2.1 Outline 

Our bounds are based on the following observation: a point p G V is not extreme, if 
each of the 2^ orthants centered at p contains at least one point. In this case, we say 
that p is not maximal. It follows immediatly that the number of maximal points is an 
upper bound on the number of extreme points. Therefore, we will from now on count 
the number of maximal points. 

In Section 3 we consider average case for points sets of n points drawn independenty 
from a distribution A of the form described above. We show that for every such A the 
expected number of extreme points is Cl(log‘*“^ n). We remark that this result is well- 
known. We present it to illustrate our approach in the smoothed case. 

In Section 4 we develop our main lemma and apply it to obtain upper bounds on 
the smoothed number of extreme points for noise coming from the Gaussian normal 
distribution and the uniform distribution in a hypercube. 
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Fig. 1. A point in three dimensional space has eight orthants.’ and ’Every extreme point is also a 
maximal point.’ 



3 Average Case 



We illustrate our approach on the well-understood case of n points in chosen from 
a probability distribution A that is the d-fold product of a one dimensional probability 
distribution. We show how to obtain in this case an upper bound of 0(log^~^ n) on the 
number of maximal points and hence on the number of extreme points. 

Theorem 1. Let V = {pi, . . . ,Pn} be a set of n i.i.d. random points in chosen 
according to the probability distribution A. Then the expected number of extreme points 
ofV is 0(log‘^~^ n) for fixed dimension d. 

Proof. To prove the theorem we show that 

Pr [pfe is maximal] = 0{log'^~^ n/n) . (1) 



By linearity of expectation it follows immediately that the number of extreme points 
is 0(log‘^~^ n). To prove (1) we consider the probability that a hxed orthant o(pfc) is 
empty. Using a standard union bound we get 

Pr [pfc is maximal] < 2^^ • Pr[o(pfc) is empty] . 

W.l.o.g. we now fix orthant o{pk) := ^]- write the probability 

that o(pfe) is empty as an integral. For reasons of better readability we establish first the 
2-dimensional integral. 

Consider p^ having the coordinates (x, y). The probability for any other point pj € 
V — {pk} to be not in o{pk) is then equal to the probability that the point pj lies in one 
of the other orthants of pk- E.g. for the first orthant [p[,^\oo) x [p^^\oo) =: Oi this 
probability is 



Pr [pj € oi] = Pr 



' ( 1 ) ^ 1 

p) > X 



Pr 



' (2) \ 

p) >y 






The probability for the remaining 2 orthants can be expressed similarly yielding for 
the probability 



Pr[pj i o(pfc)] = (1 - <P{x)) ■ <P{y) -f (1 - <P{x)) ■ (1 - <P{y)) + <P{x) ■ (1 - <P{y)) 
= 1 - ${x) ■ <P{y) . 
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Since there are n — 1 other points in V the probability that o{pk) is empty is exactly 

f , ‘Pi.x) ■ (fiiy) ■ (1 - <P{x) ■ dxdy . (2) 

=\ z = z(x) 

We solve this integral by the indicated substitution where dz = —ip{x) ■ ^{y) ■ dx. We 
get 



Hy) 

l_<Z,(y) ^y) 



■ dzdy = - ^ 



n M ^(y) 

Another substitution yields, where dz = —(p{y) ■ dy 



(1 - . 

= : 2; 



1 



1-z" 
1 - ^ 



= ly / = - V- = 

71 T7 Z— / 7 



n — 1 



1^1 logn + 0(l) 



i=0 ' 



n ^ ' i 

i—l 



Since there are 4 orthants and n points it follows that the expected number of maximal 
points in the planar case is 0(log n). 

The integral (2) for the d-dimensional case is of the following form and can be solved 
by repeated substitutions in a similar fashion: 

n—1 



f ^ <p{x^^'') ■ ■ ■ ip{x^'^'^) ■ E n d>(a;W) J]^(l - d>(a;(*))) | dx^^^ • • • dx^'^) 

\ic[d] iGl ' 

^ n ^ ii ^ id-2 

= -E-E---- E 

71 ^ 9-1 Z— / 7„ Z— / 



i^X 






n ^ ' *2 . — , ^d-l 

*1 = 1 *2 = 1 *d_l = l 



The expected number of extreme points in d dimensions is then O ^log 



d-l , 



□ 



4 Smoothed Case 

We now want to apply the same approach to obtain an upper bound on the smoothed 
number of extreme points. We consider a perturbed point pk = Pk + Vk, where pk = 
{pI^\ ■ ■ ■ iP^k'^) G [0, l]'^ andrfc a random d-vector chosen from a probability distribution 
which is the d-fold product a one dimensional probability distribution of mean 0. Let 
again : K — >■ K be the probability density of that distribution. 

Again we want to bound the probability that a hxed orthant of a perturbed point 
is empty. We make the observation that perturbed point pk is a random vector from a 
probability distribution of mean pk with one dimensional density function p(x — pk) 
and distribution function <P{x — pk) for its components. In other words, perturbed point 
Pk has a slightly shifted density and distribution function. Then 

Pr [o{pk) is empty] = if - p\}^^ ■ ■ ■ p ■ 

y IT ^ ~ IT ~ ^ ~ • 

j^k lc[d] *ei 
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The main idea is now to subdivide the unit hypercube into m = 1 /S‘^ smaller axis- 
aligned hypercubes of side length S. Then we subdivide V into sets Ci , . . . , Cm where 
Ci is the subset of V that is located (before the perturbation) in the ^-th small hypercube 
(we assume some ordering among the small hypercubes). Now we can calculate the 
expected number ViCg) of maximal points for the sets Ci and use 

m m 

ViV) < Y^ViCi) < Y.V{C^) (3) 

i=i fci 



to obtain an upper bound on the expected number of extreme points in V. The advantage 
of this approach is that for small enough i5 the points in a single small hypercube behave 
almost as in the unperturbed random case. 

We now want to compute the expected number of extreme points for the sets C^. 
We assume w.l.o.g. that set is in the hypercube [0, S]‘^ and that is of magnitude 
Cf,. Let S = {6, . ■ ■ ,S) and 0 = (0, . . . , 0). Again we aim to find an upper bound on 
the probability that for a point pk € the orthant o{pk) is empty. This probability is 
maximized if = 0 and pj = S for every pj € Ce, pj ^ pk- Hence, we get 



Pr[o(pfe) is empty] < 




• 



lc[d] iel i^I 




. 



(4) 

(5) 



In a next step we will expand the product of density functions in (4) by a multiplication 
in the following way 



= ip ■■■p — ( 5 ^ • 



p (x(^) — (5) 



p (x^^*) — S) 



Where the ratios are bounded we can extract them from the integral and solve the 
remaining integral as in the average case analysis. We summarize this technique in the 
following crucial lemma: 



Main Lemma. Let p be the density of the probability distribution of the random noise. 
For positive parameters 6 and r define the set := |x G K ; > x|. Let Z 

be the probability of set Zf^, i.e. 






/ ■ ■ ■ / f ' " ( ■ 

M-zi^ 



■ dx 



(d) 



d — i 



*v^ 

i 



The smoothed number of extreme points is then 



V{V) < 2'^- 





d 

1 d—1 I 

• log n + n ■ 
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Proof. For better readability we first look only at the 2-ditnenisonal case. We consider 
again the points Ci Q V lying in the small hypercube [0, i5]^. After some standard 
simplihcations the integral (4,5) is of the form 




P{x) ■ <f{y) ■ (1 - 



S)-^{y-S)y^-'^dxdy . 



(6) 



Recall that := |a; € K ; > r| is the subset of K that contains all elements 

X for which the ratio if{x) /ip{x — S) is larger than r. Now where the ratio of densities is 
bounded we expand the product of the density functions in the described way and pull 
the factor r in front of the integral. We obtain an upper bound on integral (6) of 





r-ziJr-zi^ 


(f(x — S) ■ 


■ p{y-^) ' 


■ (1 — — (5) • <l>{y—5)Y‘~^dxdy 


(7) 




Ir-z^ 

d,r d,r 


ip(x) 


■ p{y) 


■ (1 — (5) • d>{y—5)Y‘~^dx dy 


(8) 




R-Zf Izf 

d,r o,r 


<p(x) 


■ p{y) 


■ (1 — ^(a;— (5) • d>{y—5)Y’~^dxdy 


(9) 




zf Izf 

o,r 6,r 


<fi(x) 


■ pi.y) ■ 


■ (1 — tP(a: — (5) • <!>{y — 5)Y'~^dxdy 


(10) 



Totally analogously the c?-dimensional integral can be split up. The first part (7) can 
easily he solved by applying the average case bound of theorem 1 yielding for the d 
dimensional case a bound of r'^ • log'^”^ q/c^. For the other integrals (8, 9, 10) we 
bound the expression (Eic[d] ~ ~ by 1 

yielding for the remaining sum of integrals an upper bound of 



2=0 



/ 



Ir 





^(5,r 









d—i 



which we denote by Z. Now we conclude that 

'DiCi) < +a-2‘^-Z . 

Cl 

By (3) the main lemma follows immediately. □ 

In the following two subsections we want to apply the main lemma to two random 
noise distributions, namely we consider Gaussian normal noise and uniform noise. 



4.1 Normal Noise 

For normally distributed noise we will show the following theorem. 
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Theorem 2. The smoothed number of extreme points under random noise from the 
standard normal distribution N(0, a) is 



for fixed dimension d. 



■1 X 

Proof. Let pyx) := ^ • e ^ be the standard normal density function with expec- 

tation 0 and variance In order to utilize the main lemma we choose (5 := u/B where 



B := 






For X < — 2crB it holds that 



P{x) 

p{x — i5) 



= e"^ 






20-^ = e 



< + 7b2 



< e'^ 



Therefore, if we choose r := we have C (— oo, —2aB], The next crucial step is 
to bound Z given by 



2=0 ^ 



-2aB 



r-2(TB 






/ — OO J —2(tB J —2(tB 



d—i 



d-1 






2 = 0 



-2(tB 



d—i 



p{x) dx 



( 11 ) 



From probability theory it is well known that we can estimate the tail of the standard 
normal probability distribution N(0,a) in the following way: it is p{x) dx < 

g-fc /4 



/ -2ctb r 

p{x) dx < e~^ = d 

-OO ' 



= {/ 1 +- -1 . 
n 



(12) 



Combining (11) and (12) we get that 

d-1 






2 = 0 



i + i-i 

n 



= 1 + 1 +- -1 -1 = - 



Altogether we can apply the main lemma with the earlier chosen 6, with r = and 
Z < 1/n. This gives that the smoothed number of extreme points is at most 



2<i . g3d , 



■ log'^"^ n + 1 = O 



■log' 



Z/2-d-l , 



as desired. 



□ 
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4.2 Uniform Noise 



We consider random noise that is uniformly distributed in a hypercube of side length 2e 
centered at the origin, i.e. we have the one dimensional density function 



^p{x) = 



h ifa;G[-e,e] 

0 else 



for the random noise. We show 

Theorem 3. The smoothed number of extreme points under uniform random noise from 
a hypercube of side length 2e is 



O 



n ■ log 71 ^“*+ 



for fixed dimension d. 

Proof. Again we want to utilize our main lemma. We choose r = 1, then it follows 
immediately that for 0 < <5 < 2e we have = |a: € M ; ^ ~ 

K — [<5 — e, e]. We now have to compute Z which is 

d—i i 

= E (")■ (s) ' ^ (s) ' - (2. - i)-) 

With the main lemma it follows that the smoothed number of extreme points is at most 

d c- \ 



■yd 



■ log^ ^ n + n ■ d ■ ^ 



If we choose S = O ^ ^ n " ^ 



we obtain the desired bound. 



□ 



5 Conclusion 

We analyzed the smoothed number of extreme points, which is a measure for the com- 
plexity of the convex hull of point sets under random noise. We derived bounds for 
several general noise distributions and gave explicit bounds for random noise from the 
Gaussian normal distribution and the uniform distribution in a hypercube. 
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There are many interesting problems concerning the smoothed complexity of geo- 
metric objects. For example, it would be interesting to analyze the smoothed number of 
faces of the convex hull of a point set. It would also be interesting to obtain similar results 
for other geometric structures like the Delaunay triangulation or the Voronoi diagram. 
Finally, it would be nice to either improve our bounds or find matching lower bounds. 
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Abstract. We study the fixed parameter tractability of the parameter- 
ized counting and decision version of the restrictive i^-coloring problem. 
These problems are defined by fixing the number of preimages of a subset 
C of the vertices in H through a partial weight assignment {H,C,K). 
We consider two families of partial weight assignment the simple and the 
plain. For simple partial weight assignments we show an FPT algorithm 
for counting list {H, C, A)-colorings and faster algorithms for its decision 
version. For the more general class of plain partial weight assignment we 
give an FPT algorithm for the {H, C, A)-coloring decision problem. We 
introduce the concept of compactor and an algorithmic technique, com- 
pactor enumeration, that allow us to design the FPT algorithms for the 
counting version (and probably export the technique to other problems). 



1 Introduction 

Given two graphs G and H, a homomorphism from G to H is any function 
mapping the vertices in G to vertices in H, in such a way that the image of an 
edge is also an edge. In the case that H is fixed, such a homomorphism is called 
iT-coloring of G. For a given graph H, the H -coloring problem asks whether there 
exists an i7-coloring of the input graph G. The more general version in which a 
list of allowed colors (vertices of H) is given for each vertex is known as the list 
iT-coloring. The case in which the number of pre-images of the vertices in H are 
part of the input, is known as the restrictive H -coloring or the restrictive list 
H -coloring. Besides the decisional versions of these problems, we also consider 
the problems associated to counting the number of solutions, in particular we 
consider the #iT-coloring, the list #iT-coloring, the restrictive #if-coloring, and 
the restrictive list #if-coloring. The known results concerning the complexity 
of all the problems defined so far are summarized in Table 1. 

* Partially supported by the FU within FP6 under contract 001907 (DFLIS). The first 
author was further supported by the Distincio per a la Recerca of the Generalitat 
de Catalunya. The second and third authors were further supported by the Spanish 
CICYT project TIC-2002-04498-C05-03. 



S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 275-286, 2004. 
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Table 1. Complexity of the parameterized //-coloring problems 



Problem 


P versus NP-complete/#P-complete 


list H-coloring 


dichotomy^^^ [9] 


//-coloring 


dichotomy*^^^ [12] 


list ^//-coloring 


dichotomy^^^ [3,13] 


^//-coloring 


dichotomy^^^ [7] 


restrictive list //-coloring 


dichotomy^®^ [3] 


restrictive //-coloring 


dichotomy^^^ [3] 


restrictive list ^//-coloring 


dichotomy^^^ [3] 


restrictive ^//-coloring 


dichotomy^^^ [3] 


list (//, C, //(-coloring 


dichotomy^®^ [2,5] 


(//, C, //(-coloring 


P: list (// — (/(-coloring in P 
NP-hard: (//-(/(-coloring NP-hard [2,5] 


list #(//, C, //(-coloring 


dichotomy^^^ [2,5] 


#(//, (7, //(-coloring (// fixed( 


P: list #(// — (/(-coloring in P 
^P-complete: (H,C,K) irreducible [2,5] 



(1) // is a bi-arc graph. 

(2) H is bipartite or contains a loop. 

(3) All the connected components of H are either complete reflexive graphs, 
or complete irreflexive bipartite graphs. 



A parameterization for the restrictive //-coloring problem was introduced 
in [2], and it is denoted as {H,C, K)- coloring. The parameterized version of the 
restrictive H -coloring where C is the set of vertices with restrictions and K is 
a weight assignment on C defining the number of preimages. (//, C, K) is called 
the partial weight assignment that fixes the parameterized part of the problem. 
We studied the complexity of the (//, C, /f)-coloring in [2,5] (see also [4]) and 
the results are summarized in Table 1. 

Recall that an FPT algorithm for a parameterized problem with input size n 
and k as parameter is one that solves the problem in time 0{f{k)n^^^^) where k 
is the parameter and n is the input size. We isolate a subclass of the problem, the 
simple partial weight assignments (see definition later), for which we were able 
to design linear time fixed parameter algorithms for both counting and decision 
version of the list {H, C, /f)-problem. In the case of the (//, C, /f)-coloring we 
can isolate a larger class, the plain partial weight assignments, for which the 
(//, C, /f)-problem problem is in class FPT. 

One of the fundamental tools for designing FPT algorithms for decision prob- 
lems is the so called Reduction to problem kernel. The method consists in an FPT 
self-reduction (in polynomial time in n and k) that transforms any problem in- 
put a; to a reduced instance f{x) of the same problem, the kernel, such that the 
size of f{x) depend only on some function of k. (See [6] for discussions on fixed 
parameter tractability and the ways of constructing kernels.) 

In recent years, there has been a big effort focused on developing a the- 
ory of intractability for parameterized counting problems (see for example [10, 
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14]). However, so far, little progress has been done in the development of algo- 
rithmic techniques for counting problems. We introduce a new algorithmic tool 
denoted the compactor enumeration that, we expect, will play a role similar to 
kernelization for counting problems. We define a general type of structure, called 
compactor, which contains a certificate for each set in a suitable partition of the 
solution space. The size of such a structure (as well as the sizes of its elements) 
depends only on the parameterized part of the problem. Formally, given a pa- 
rameterized problem U, where k is the size of the parameter, let Sol(7T, fc) be 
the set of solutions. We say that Cmp(7T, k) is a compactor if 

— |Cmp(iT, /c)| is a function that depends only on k. 

— Cmp(77, k) can be enumerated with an FPT algorithm. 

— There is a surjective function m : Sol(iT, k) — ?► Cmp(77, k). 

— For any € S Cmp(7T, k), |{m“^(£)}| can be computed by an FPT algorithm. 

Observe that if the above conditions hold, by enumerating all certifying struc- 
tures and computing, for each of them, the number of solutions they certify, we 
obtain a FPT algorithm for counting the solutions of the initial problem. 

We stress that our method is completely different than the method of reduc- 
tion to a problem kernel as, in our approach, we essentially reduce the counting 
problem to an enumeration problem. Moreover, in the reduced problem the in- 
stance is not a subset of the instances of the initial problem as in the kernel 
reduction case. In some of our applications the compactor is a structure con- 
sisting of suitable collection of functions. We stress that our new method is also 
applicable to parameterized decision problems where the technique of reduction 
to a kernel seems to fail, like for instance the {H, C, iF)-coloring (see Section 4). 

The complexity of all our FPT algorithms for counting are linear in n, and it 
is of the form 0 {fi{k)n+ f2{k)g + f3{k) logn-F / 4 (fc)), where g is the number of 
connected components of G and functions fi increase in complexity as i grows. 
Although the counting algorithms can be used to solve the decision version of 
the problem, we design specific algorithms for the decision version where the 
contribution of the parameter is similar of with lower order, with the exception 
that the log n term disappears. To achieve this fast FPT algorithms for particular 
sub cases of our problems we combine compactors with different design tools like 
reduction to a kernel, bounded search trees, and parameterized reduction (see for 
example [15] for a survey on tools for parameterizing algorithms). 

Finally, let us mention that several parameterized problems can be mod- 
eled as particular cases of the {H, C, A')-coloring, in which {H, C, K) is simple. 
Therefore, the techniques developed in this paper can be used to produce FPT 
algorithms for both the counting and the decision versions. For instance counting 
the number of vertex covers of size k is know to be in FPT by [10], the result can 
be easily obtained from this paper. Also, the following parameterization of Par- 
tition; “Given integers Xi, . . . ,Xn and a parameter k, is there a set I C [n] such 
that ^ FPT algorithm for both counting 

and decision problems. 
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It is a remaining open question whether the classes of parameterized assign- 
ments that we present in this paper are the wider possible that correspond to 
parameterized problems solvable by FPT algorithms. 

2 Definitions 

We use standard notation for graphs. Following the terminology in [8,9] we say 
that a graph is reflexive if all its vertices have a loop, and that a graph is 
irreflexive when none of its vertices is looped. We denote hy g = g{G) the number 
of connected components of G. For U C V{G), we use G — U as a notation for 
the graph G\V{G) — U]. Given two graphs G and G', define G U G' = {V{G) U 
V{G'),E{G) U G(G')), and G 0 G' = {V{G) U V{G'), E{G) U E{G') U {{u, u} | 
u e V{G),v € V{G')}. Given a bipartite connected graph G, the vertex set 
V{G) is splitted among two sets X{G) and Y{G) so that every edge has an end 
point in the X-part and another in the y-part. Observe that the X and Y parts 
of a connected bipartite graph are defined without ambiguity. iFr,s denotes the 
complete irreflexive bipartite graph on two sets with r and s vertices each, notice 
that Krfi denotes a complete bipartite graph, and denotes the reflexive clique 
with n vertices. Nc{v) denotes the set of neighbors of v in the graph G. 

Given two functions ^ N and i/i : A' — >■ N, with A' A = 0, define 
((/? U 'ip){x) to be for x € A, and tpix), for x £ A'. Given two functions 

: A ^ N, <p + i/j denotes it sum and we say that 4> < ii, for any a; £ A, 
4>{x) < "(f{x). We denote by 0 the empty function. We often need contiguous 
sets of indices, for n, m £ N, we define [n] = {!,... , n}, [—m] = {—m, . . . , —1}, 
[—TO, n] = {—TO, ... , — 1, 0, 1, . . . , n}, and (nj = {0,1,... , n}. 

Given two graphs G and El we say that x • ^{G) — >■ V{H) is a H -coloring 
of G if for any {x,y} £ E{G), {x{x),x{v)} S E{H). 

A partial weight assignment on a graph H is a, triple {H, G, K), where G C 
V{H) and AT : G — >■ N. We refer to the vertices of G as the weighted vertices. For 
each partial weight assignment {H, G, K), we use the notations S = V{H) — G, 
h = \V{H)\, c = |G|, s = [S'! and k = ^ {H,G,K) is 

positive if K{a) ^ 0 for any a € C. 

Given a partial weight assignment {H,G,K), x '■ Y{G) — t V{H) is a 
{H,G, K)- coloring of G if y is an Ff-coloring of G such that for all a £ G, we 
have |x“^(a)| = K{a). We denote by 'H{h,c,k){G) the set of all the {H,G,K)~ 
colorings of G, often we drop the subscript when {H, G, K) is clear from the 
context. For a fixed partial weight assignment {H, G, K), the {H, G, K) -coloring 
problem asks whether there exists an (F7, G, AT)-coloring of an input graph G. In 
this definition the parameters are the integers in the images of K. The param- 
eterized independent set and vertex cover problems are particular instances of 
the {H, G, iF)-coloring problem. 

For a fixed graph H, given a graph G, a {H, G)-list is a function L : V (G) — ?► 
mapping any vertex of G to a subset of V{H). For any v £ V{G), the 
list of V is the set L{v) C V{H). The list H -coloring problem asks whether, 
given a graph G and an {H, G)-list L, there is an i/-coloring y of G so that 
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for every u € V{G), x(u) € Liu). This problem can be parameterized in the 
same way as the if -coloring to define the list (H,C\K) -coloring and the list 
{H , C ,< K) -coloring problems. We denote by 'H{h,c,k){G,L) the set of all the 
(if, C, if)-colorings of {G,L). As before we will use 'H{G,L) when {H,G,K) is 
clear from the context. 

As usual, the #(if, C, iC)-coloring and the list #(if, G, iC)-coloring, problems 
denote the counting versions of the corresponding decision problems. 

Notice that in contraposition to the if-coloring, a (if, G, ff)-coloring, in gen- 
eral, cannot be obtained as the disjoint union of (if, C, iCj-colorings of the con- 
nected components of an input graph G. This is expressed in the next lemmas, 
whose proof is left to the reader. 

Lemma 1. Let G be a graph with connected components Gi, i = 1, . . . ,g, and 
let Xi be a (if, C, Ki)-coloring ofGi, for any i, 1 < i < g. Then, x* = XiU- • "Uxt 
is a (if, G,Ki Kt) -coloring of G. 

Lemma 2. Let (H,C,K) and {H' ,G' , K') be two partial weight assignments. 
Let G and G' be two graphs, let x be an (if, G, K) -coloring of G and let x' an 
(if', C", K')-coloring of G' . Then x U x^ a {H U if', C U C", K U K') -coloring 
ofGUG'. 

Finally, we define some types of partial weight assignments on connected 
graphs. Let if be a connected graph, a partial weight assignment is a i- 
component if E{H — C) = 0; a 2-component if if is a reflexive clique; a 3- 
component if if is a complete bipartite graph and G C Y{G); a ^-component if 
H[G] is a reflexive clique with all its vertices adjacent with one looped vertex 
of if — C; and a 6-component if if is bipartite, G ^ Y{H), there is a vertex 
X € X{H) adjacent to all the vertices in C, and at least one vertex in Y (if) — G. 

A partial weight assignment (if, G, K) is said to be simple when each con- 
nected component if' is either a 1-component, a 2-component or a 3-component; 
it is said to be plain when each connected component if' of if is either a 1- 
component, a 4-component, or a 6-component. Observe that the class of simple 
partial weight assignments is a generalization of the class of compact partial 
weight assignments used in [5] . 



3 Counting {H, C, l^)-Colorings of Connected Graphs 

We introduce some vertex-weighted graphs and extend the notion of list 
(if, G, ff)-coloring to them. A tribal graph is a graph G together with a pos- 
itive weight assignment p : V{G) N+. The vertices in the set T{G) = {u € 
V{G) I p{v) > 1} are called tribes of G and the vertices in f(G) = V{G) — T{G) 
individuals of G. The expansion G of a tribal graph is a graph in which each 
V € T{G) is replaced by p{v) copies of v. We associate a tribal graph with a 
subset of vertices of a graph, typically with “few” vertices, and an equivalence 
relation on the remaining vertices, represented by tribes. Our compactors are 
obtained by restricting the maximum weight of a tribe, in the following way: 
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For a non negative integer r, the r -restriction of a tribal graph (G,p), is the 
tribal graph {G,p') where, for any v € V{G), p'{v) = min{p(w), r}. For sake of 
readability we denote a tribal graph as G, eliminating the explicit reference to 
the node weights. _ _ 

A list (H, C, K) -coloring of a tribal graph G and an (H, G)-list Lis a, mapping 
w : V{G) X V{H) — >• {k] where: (1) for any v € V{G) and a G S, w{v,a) < 1; 
(2) for any v S V{G) and a G G, w{v,a) < K{a)\ (3) for any v S V{G), 1 < 
o.) ^ (4) if {ri, u} G E{G) then for all a,b G H with w{v, a) > 0 

and w{u,b) > 0, {a, 6} G E{H); (5) for any a G G, “ ^(®); 

and (6) for any v G V{G) and a G V{H) with w{v, a) > 0, a G L{v). 

Notice that in the case C = % the above definition coincides with the concept 
of list i?-coloring. Moreover, in case that p{v) = 1 for any v G V(G), we obtain 
a list (E, G, AT)-coloring of (G, L) by assigning to each vertex in G the unique 
vertex in V{H) for which w{v,a) = 1. Also, by removing condition 6 we get a 
definition of (iJ, G, AT)-coloring of a tribal graph. 

The following result gives some initial results on colorings for tribal graphs. 
We use n (m) to denote the number of vertices (edges) of a tribal graph G. 

Lemma 3. Let (H, G, K) be a partial weight assignment. Let G be a tribal graph 
and let L be a {H,G)-list. Given w : V{G) x V{H) -G {k] we can check whether 
w is a list {H,C, K)- coloring of{G,L) in time 0{h?n-\- fh). Furthermore, |{w : 
V{G) X V{H) (/c]| < ((fc+l)"2'*)". 

Our next step is to introduce an equivalence relation on the vertices of G and 
construct a tribal graph from there. Let (iJ, G, K) be a partial weight assignment, 
for a given graph G together with a (iL, G)-list L, define V to be the partition 
of V{G) induced by the equivalence relation, 

u ^ u iff [Nq{v) = Nc{u) A L{v) = L{u)]. 

For V G V(G), Py denotes the set {u | it ^ u} and for any Q G P, we select a 
representative vertex vq G Q. 

In general, we keep all the vertices in some classes as individuals and one 
representative of each remaining classes as tribes. We say that R C V{G) is a 
closed set if for any v G R we have Py C R. Let ii be a closed set, the tribal 
graph associated to (G, i?) is G = {G[R U {uq | Q fl i? = 0}],p), where p{v) = 1 
when V G R, and p{vq) = \Q\ for any QC\ R = %. 

Let Gfe+s be the k + s-restriction of the tribal graph G associated to G and 
some closed set R. The following lemmatta show the properties that will allow us 
to use ^{{Gk+s, L) as a compactor for 'H(G, L) in several cases. Observe that as 
V (G) can be seen as a subset of V (G), so we can keep the name of the associated 
(E, G) list as L in order to simplify notation. 

Lemma 4. Let (E, G, K) be a partial weight assignment. Given a graph G to- 
gether with a {E, G)-list L and a closed set R, let G be the tribal graph associated 
to (G,R). Then, there is a surjective function from H{G, L) into E{Gk+s,L). 



Fixed Parameter Algorithms for Counting and Deciding 281 



Given a list {H, C, iC)-coloring w of (Gk+s,L), let NE{w) be \{x S "H(G, L) \ 
w = Wpj.}|, for any v € T{G) define the function W 



W{w, v) 




riaec^"(^^a)! V cr 



Pv\-T 



,U|P„|-r-o 



where a = j a,) and t = X^aeC tc('y, a). In the next result we show how 

to compute NE{w) using the function W. As it is usual when dealing with a 
counting problem, we assume that all basic arithmetic operations take constant 
time. 



Lemma 5. Let (i?, C, K) he a partial weight assignment. Given a graph G to- 
gether with a {E[,G)-list L and a elosed set R, let G be the tribal graph as- 
soeiated to (G,R) and let w be a list {E[,C, K)- coloring of (Gk+s,L). Then, 
NE{w) = t>). Furthermore, for any v € T{G), given w, G and 

\Pv\, the value W{w,v) can be computed in 0(log /i + clog fc + log n). 

Now we have shown most of the required compactor properties, it remains 
only to show that in the particular classes of partial weight assignments the 
obtained compactor is small. Before going into that we present the central idea 
the algorithm Count-List-Colorings that we will adapt later to the different cases. 
The idea of the algorithm is given w S ^{{Gk+s, L) we have to compute the 
number of possible extensions to a list {El, G, iC)-coloring of (G, L) and, according 
to Lemmata 4 and 5, the overall sum of these numbers will be the number of 
the list (iL, G, iC)-colorings of (G, L). 



function Count-List-Colorings(if', C, K, G, L, R) 

input A partial weight assignment {H, C, K) on H such that E{H — C) — fl) 
A graph G and a {H, G)-list L, 
output The number of {H, G, A)-colorings of (G, L). 

begin 

GT = Tribal(Cf, G, K, G, L, R) 

GTR = Restrict(GT, fc -I- s) 

V = Q 

for all w : V{GTR) x V{H) {0, . . . , fc} do 
if w G H{GTR,L) 

then 

P = P + n„6T(GT) W{W,V) 

end if 
end for 
return v 

end 



Theorem 6. Let {H, G, K) be a partial weight assignment. Given a graph G 
together with a {H,G)-list L and a closed set R. Then, algorithm Count-List- 
Colorings outputs the number of list {H, C, K)-colorings of (G, L) within 0{n{n-\- 
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h)+{n+h)2'^'^^+{h'^n+m) {{k + 1)°2®)"), steps, where n and (m) are the number 
of vertices (edges) of the tribal graph G associated to (G,R). 

In the following we specify the particular definition of the equivalence re- 
lation for the case of 1-components and describe the main result for 2 and 3- 
components. 

In the case of 1-components the compactor is obtained after splitting the 
vertices of G according to degree and connectivity properties. Given a graph 
G and an integer k, the k-splitting of G is a partition of V{G) in three sets, 
(i?i, i? 2 , .Rs) where i?i is the set of vertices in G of degree > k, R2 is formed 
by the non isolated vertices in G' = G\V{G) — i?i] and R3 contains the isolated 
vertices in G'. 

Lemma 7 . Let (H, G, K) be a partial weight assignment, where H — C is edge- 
less. Given a graph G, let (Ri, R2, R3) be the k-splitting of G. If |i?i| > k or 
|i? 2 | > k'^ -\- k, then R{G,L) = 0. Furthermore, the k-splitting of G can be 
computed in 0{kn) steps. 

The next result is obtained taking into account that all the vertices in R3 
have degree at most k and that for any v G R3 we have Ng{v) C Ri. 

Lemma 8. Let (H, G, K) be a partial weight assignment, where H — C is edge- 
less. Given a graph G and a {H,G)-list, let {Ri, R2, R3) be the k-splitting ofG. 
Then, i? = U i ?2 is a closed set. Furthermore, the tribal graph G associated 
to (G,R) has at most k'^ -\- 2 k -\- 2 ^^^ vertices and [k^ -\- 2 kY -\- k 2 ^~^^ edges, and 
G can be computed in 0 {{h -\- k)n -\- {h-\- k) 2 ^~^^) steps. 

Using the previous lemmatta we can compute |"H(G,L)| in FPT. 

Theorem 9 . Let (H, G, K) be a partial weight assignment, where H — G is edge- 
less. Given a graph G and a {H,G)-list L, let {Ri, R2, R3) be the k-splitting of 
G and let G be the tribal graph associated to (G, i?i U R2). Then, |"H(G, L)| = 
L) Furthermore, |'H(G,L)| can be computed within 0 {{h-\-k)n-\- 

-I- -I- (log h -I- clog fc -I- logn)2^+'^) ((fc -I- 1)'=2'*)^ + 2 fc-i -2 ^ steps. 

In the case of 2-components we use the following splitting: Given a graph G 
together with a {H, G)-list L, the list splitting of (G, L) is a partition of V{G) in 
two sets, (Ni,N2) where Ni = {v G V{G) \ L{v)f^S = %}. Let G' = (U(G),0). 

Theorem 10 . Let {H,G,K) be a partial weight assignment with H = Kf,. 
Given an edge-less graph G and a {H,G)-list L, let {N\,N2) be the list- splitting 
of{G,L). Then, if \Ni\ > k, there are no (H,G, K)- colorings of{G,L). Other- 
wise, Ni is a closed set and the tribal graph G associated to (G', has at most 
k -\- 2 ^ vertices and no edges, and R{G,L) = R{G',L). Furthermore, \R{G,L)\ 
can be computed within 0{hn-\- 2^ -\- {h‘^{k-\- 2^) -\- {log h-\- clog k-\-logn)2^){{k -\- 
l)c 2 s)fe-i- 2 '‘) steps. 
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In the case of 3-components we use the following splitting: Let (iL, C, K) be 
a partial weight assignment with H = K^y and C C Y{H). Given a bipartite 
graph G together with a (iL, G)-list L. The bipartite splitting of (G,L) with 
respect to {H,C,K) is a partition of V{G) in three sets, (Mi, M 2 , M 3 ) where 
M 3 = {v € V{G) I L{v) n S' 7 ^ 0}, Ml = X{G) - M 3 and M 2 = Y{G) - M 3 . 
Given a connected bipartite graph G, let G' = (X(G), F(G), A1(G) x Y(G)). 

Theorem 11. Let {H,G,K) he a partial weight assignment with H = K^y and 
G C Y{H). Given a complete bipartite graph G together with a (H,G)-list L, 
let {Ml, M 2 , M 3 ) be the bipartite list partition of (G,L). Then, if \M\\ > 0 
and IM 2 I > 0, or |Mi| -|- IM 2 I > k, there are no list {H,G,K)~ colorings of 
{G,L). Otherwise, M = Mi U M 2 is a closed set, the tribal graph G associated 
to {G' ,M) has at most k + 4^ vertices and {k-\-2^)2^ edges, it can he computed 
in 0{hn + {h + 1)2^) steps, and 'H{G,L) = 'H{G',L). Furthermore, \'H{G,L)\ 
can be computed within 0{hn + {h+ 1)2^ -|- {h‘^{k + 4^) -I- (fc -|- 2^)2^ -|- (log h + 

clog A: -I- logn)4^) ((fc -I- ) steps. 

4 Counting List (H, C, LC)-Coloring for Simple Weight 
Assignments 

In this section we provide an FPT algorithm for counting the number of list 
(M, G, AT)-coloring when all the connected components of FI are 1, 2 or 3- 
components. The graph FI is decomposed as the disjoint union of 7 components, 
H_p, . . . , Hy, where Hq is is the disjoint union of the 1 -components, for 6 € [ 77 ], 
Ml is a 2-component, while for t € [— p]. Hi is a 3-component. 

Furthermore, the graph G may have different connected components, at least 
one for each component of H that contains a vertex in C. We assume that 
G has g connected components, {Gi, . . . , Gg}. Given a {H, G)-list L, we often 
have to speak of a list coloring of a component Gi, we will refer to this as a 
{Gi,L) coloring, understanding that we keep the original list restricted to the 
adequate sets. Furthermore, we assume that Gi is always mapped to a connected 
component of H. 

We need further definitions to establish the form and properties of the com- 
pactor. For any i G [ 5 ] and l £ [—p,rj\, Gu denotes the tribal graph obtained 
from {Gi,Lif) through the splitting {k, list or bipartite) corresponding to the 
type of (Ml, Gt, Alt). The vertices in G^ are relabeled so that if = \V{Gif)\, 
then V{Gif) = [uif\ and, for any i,l<i< n^, pu{i) < pu{i + 1). 

For any i G [g\, l G [— p, p], and w : [nif\ x V{Hf) — >■ {kf\ set f3{i,b,w) = 1 or 
0 depending on whether w is a list Mt-coloring of G^i or not. 

For any t G [— p, p] we define the following equivalence relation on [g\. 

i j iff Ui, = Uj, and Vu G [n^] Pu{v) = Pji{v) 

and V w : [nu] x V {Hf) -)> [kf\ j3{i, l, w) = j3{j, l, w). 
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We say that two components of G are equivalent when they are equivalents 
with respect to all the components of H. Formally, for i,j € \g], i ^ j iS Vt € 
[—p, rj] i j. For i g [g], let Qi denote the equivalence class of i. 

Let Q be the partition induced in [ 5 ] by Notice that we have at most 

+ s)(fc^ + 2k + +2k+2 

equivalence classes. Each class can be represented by a sequence of 7 triples 
{n'^ ,p^ , PP ) , formed by an integer , a tribal weight , and a binary string of length 
at most /i(fc+«)(^^+ 2 fc +2 + representing the size of the vertex set, the size of the 
tribes, and the associated mask respectively. 

For any Q G Q define 

^o(Q) = : [nf] x V{H,) -)> [k] \ Pf{w) = 1 and w~^{C) = 0}, and 

Ti{Q) = Ui£[_p,p]{w : [np] x V{H,) -)> [k] \ Pf{w) = 1 and w~'^{C) ^ 0.} 

The compactor is defined as 

T{G,L) = {{{Aq,oq,Bq))q(zq I 

VQ G Q Aq C J^i{Q),oq : Aq ^ [fc], Bg C J^o{Q) 

\/a G C o{w) |w“^(a)| = K{a)} 

QGQwGAq 

Lemma 12. Let {H, C, K) be a simple partial weight assignment. Given a graph 
G together with a (H, G)-list L. Then, there is a surjective mapping from 'H{G, L) 
into T{G, L) . 

It remains to show that the number of preimages, in the above function, of 
a sequence tr G T{G,L) can be computed in FPT. To do so, it is enough to 
guarantee that, for each equivalence class Q, the number of ways to assign the 
colorings in {Aq, oq, Bq) to all the members of Q can be computed in FPT. We 
show this fact using a dynamic programming approach. Once this is done all the 
properties of the compactor all fullfiled and we get 

Theorem 13. Let (H, G, K) be a simple partial weight assignment. Given a 
graph G together with a (H,G)-list L, then |"H(G, L)| can be computed in time 
0 ( 7 (fc + h)n + /i(fc, h)g + f 2 (k, h) logn + fs{k, h)), for suitable functions. 



5 Decision Problems 

Let us consider the decision version of the list {H, G, iG)-coloring problem for 
simple partial weight assignments: we can use a weak form of compactorization to 
obtain a faster FPT algorithm. Due to lack of space we only comment on the main 
trends. For the case of connected graphs we compute the k + 1-restriction of the 



Fixed Parameter Algorithms for Counting and Deciding 285 



corresponding tribal graph G (depending on the type of component) according to 
the definitions given in Section 3, and show that there is list {H, C, iC) -coloring of 
(G, L) if and only if there is a list {H, C, iC)-coloring of {Gk+i, L). The complexity 
of the algorithms is summarized in the following result. 

Lemma 14. Let (H,G,K) he a partial weight assignment, let G be a graph, 
and let L he a {H,G)-list. When G has no edges or H = KJ,, the existence of 
a list H -coloring of G can he decided in 0{hn) steps, while the existence of a 
list {H , G , K) -coloring of{G,L) can be decided in 0{{k -\- h)n-\- nfclog k^/n -\- k) 
steps. When H — C is edge-less, the existence of a list {H,G, K) -coloring of 
(G,L) can be decided in 0{2^c^[{k -\- h)n -|- nklogk^n -\- fc]) steps. 

When G is not connected, we make use of a more involved compactor. How- 
ever, as before, we only need to check that the compactor is not empty to check 
the existence of a list (i?, G, iC)-coloring of (G, L). To define the compactor we 
use partial weight assignments for any component of H, so for any l € [—p,rj\, 
define Wt = {W : G — >■ N | 0 < W < iCt}. Let k* = Hogc, Notice that 

|W,| = k* and that k* = OaGC -^(«) = n.G[-p.»7] K- Finally, let W+ = W,-{ 0 }. 

Our next definition will be the basis for classifying the components of G 
according to the combinations of components of H and partial weight assign- 
ments with correct list colorings. Given i G [g], l G [— p, 77 ], and W G >Vt define 
a(i,L,W) = 1 or 0 depending on whether there is a list-(ili, G^, W)-coloring of 
{Gu,L,i) or not. 

We isolate the information that has to capture a set of components of G to 
certify the existence of a list coloring. Given AC [g], and t G [—p,rf\ define 

T{A, l) = {{Z, f)\ZCA,\Z\<h,f-.Z^ W+, 

f{i) = f\Mi G Z a(i, 6 , f{i)) = 1}, and the compactor is 
iez 

J-{A) = {(Z_p, f-p ), . . . , {Zrj, frf) I Zi . . . Zp form a non trivial partition of A, 
V 6 G [-P, P] fr-Z,^W. A (Z„ /,) G T{A, 7)}. 

Thus we can establish the main property of the compactor T{A). We say 
that A C [p] is proper if T{A) ^ 0 and for any i G [g] — A, there is t G [— p, p] 
such that a{i,L,0) = 1 . 

We establish an equivalence relation to reduce the number of candidate sets 
as follows 



z ^ j iff V t G [— p, 77] V W G a{i, t, W) = a{j, t, W) . 

Let V be the partition induced by the equivalence classes. Notice that we have 
at most 2^ equivalence classes and that each class can be represented by a binary 
string of length k* . Let M C [p] be obtained by keeping fc -|- 1 representatives 
from each class with more than k members, all the class otherwise. 

Lemma 15. Let (H, G, K) be a simple partial weight assignment. Given a graph 
G together with a {H,G)-list L. Then, TL(^h,c,k){G, L) $ if and only there is 
a A C M that is proper. 
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Finally, by generating all the sets ACM and all the elements in T{A) and 
performing the adequate test, for suitable functions f '2 and of lower order 
than /2 and that appear in Theorem 13 respectively, we have 

Theorem 16. For simple partial weight assignment (H, C, K), the list (H, C, K)- 
coloring problem can be solved in time 0{'j{k + h)n+ h)g+ f^(k, h). 

Our results can be extended to provide a FPT algorithm for the {H,C,K)~ 
coloring problem in the case of plain partial weight assignments. This goal is 
achieved through a series of parameterized reductions, from the general form of 
components to the small or tiny forms that are simple. We only need to extend 
to the case of 3-components the reductions presented in [2]. 

Theorem 17. For any plain partial weight assignment (H, C, K), the {H, (7, K)- 
coloring problem can be solved in 0{kn + f^ik, h)) steps. 
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Abstract. The main contribution of this paper is an optimal bounded space online 
algorithm for variable-sized multidimensional packing. In this problem, hyper- 
boxes must be packed in d-dimensional bins of various sizes, and the goal is to 
minimize the total volume of the used bins. We show that the method used can 
also be extended to deal with the problem of resource augmented multidimensional 
packing, where the online algorithm has larger bins than the offline algorithm that 
it is compared to. Finally, we give new lower bounds for unbounded space multi- 
dimensional bin packing of hypercubes. 



1 Introduction 

In this paper, we consider the problem of online variable- sized multidimensional bin 
packing. In the d-dimensional box packing problem, we receive a sequence cr of items 
Pi,P 2 , ■ ■ ■ ,Pri- Each itemp has a hxed size, which is si(p) x • • • Sd{p)- Le. Si{p) is the 
size of p in the ith dimension. We have an infinite number of bins, each of which is 
a d-dimensional unit hyper-cube. Each item must be assigned to a bin and a position 
{xi{p ), . . . , Xd{p)), where 0 < Xi{p) and Xi{p) -F Si{p) < 1 for 1 < i < d. Further, the 
positions must be assigned in such a way that no two items in the same bin overlap. A 
bin is empty if no item is assigned to it, otherwise it is used. The goal is to minimize the 
number of bins used. Note that for d = 1, the box packing problem reduces to exactly 
the classic bin packing problem. 

We focus on the variable-sized box packing problem, where bins of various sizes 
can be used to pack the items, and the goal is to minimize the total volume of all the bins 
used. To avoid a cumbersome notation, we assume that the available bins are hypercubes, 
although our algorithm can also be extended to the case where the bins are hyperboxes. 
The available sizes of bins (edge lengths) are denoted by ai < < . . . < am = 1- 

Items arrive online, which means that each item must be assigned in turn, without 
knowledge of the next items. We consider bounded space algorithms, which have the 
property that they only have a constant number of bins available to accept items at any 
point during processing. The bounded space assumption is a quite natural one, especially 
so in online box packing. Essentially the bounded space restriction guarantees that output 
of packed bins is steady, and that the packer does not accumulate an enormous backlog 
of bins which are only output at the end of processing. 

* Research supported by Israel Science Foundation (grant no. 250/01). 

** Research supported by the Netherlands Organization for Scientific Research (NWO), project 
number SION 612-30-002. 
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The standard measure of algorithm quality for box packing is the asymptotic perfor- 
mance ratio, which we now define. For a given input sequence a, let cost^ (cr) be the 
number of bins used by algorithm A on a. Let cost((r) be the minimum possible number 
of bins used to pack items in cr. The asymptotic performance ratio for an algorithm A 
is defined to be 



Let O be some class of box packing algorithms (for instance online algorithms). The 
optimal asymptotic performance ratio for O is defined to be TZq = inf^go 71^. Given 
O, our goal is to find an algorithm with asymptotic performance ratio close to TZq. In 
this paper, we also consider the resource augmented box packing problem, where the 
online algorithm has larger bins at its disposal than the offline algorithm, and the goal 
is to minimize the (relative) number of bins used. Here all online bins are of the same 
size, and all the offline bins are of the same size, but these two sizes are not necessarily 
the same. 

Previous Results: The classic online bin packing problem was first investigated by 
Ullman [17]. The lower bound currently stands at 1.54014, due to van Vliet [18]. Define 



Lee and Lee presented an algorithm called Harmonic, which classifies items into m > 1 
classes based on their size and uses bounded space. For any e: > 0, there is a number 
m such that the Harmonic algorithm that uses m classes has a performance ratio of at 
most (1 + e)7Too [HJ- They also showed there is no bounded space algorithm with a 
performance ratio below iloo . Currently the best known unbounded space upper bound 
is 1.58889 due to Seiden [15]. 

The variable-sized bin packing problem was first investigated by Friesen and Lang- 
ston [8]. Csirik [3] proposed the Variable Harmonic algorithm and showed that it has 
performance ratio at most Iloo. Seiden [14] showed that this algorithm is optimal among 
bounded space algorithms. 

The resource augmented bin packing problem was studied by Csirik and Woegin- 
ger [6] . They showed that the optimal bounded space asymptotic performance ratio is a 
function p{h) of the online bin size b. 

The online hyperbox packing problem was first investigated by Coppersmith and 
Raghavan [2], who gave an algorithm based on Next Fit with performance ratio ^ = 
3.25 for d = 2. Csirik, Frenk and Labbe [4] gave an algorithm based on First Fit 
with performance ratio || = 3.0625 for d = 2. Csirik and van Vliet [5] presented an 
algorithm with performance ratio {IIoo)‘^ for all c? > 2 (2.85958 for d = 2). Even though 
this algorithm is based on Harmonic, it was not clear how to change it to bounded space. 
Li and Cheng [13] also gave a HARMONic-based algorithm for d = 2 and d = 3. 




~ 1) + 1) TTI — 2 



and 
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Seiden and van Stee [16] improved the upper bound for d = 2 to 2.66013. Several 
lower bounds have been shown [9,10,19,1]. The best lower bound for d = 2 is 1.907 [1], 
while the best lower bound for large d is less than 3. For bounded space algorithms, a 
lower bound of {IlaaY implied by [5]. A matching upper bound was shown in [7]. 
This was done by giving an extension of Harmonic which uses only bounded space. 
In this paper a new technique of dealing with items that are relatively small in some of 
their dimensions was introduced. 

Our Results: In this paper, we present a number of results for online multidimensional 
packing: 

- We present a bounded space algorithm for the variable-sized multidimensional bin 
packing problem. An interesting feature of the analysis is that although we show the 
algorithm is optimal, we do not know the exact asymptotic performance ratio. 

- We then give an analogous algorithm for the problem of resource augmented online 
bin packing. This algorithm is also optimal, with a asymptotic performance ratio of 
nf=i P(^i) where hi X • • • X is the size of the online bins. 

- Additionally, we give a new general lower bound for unbounded space hypercube 
packing, where all items to be packed are hypercubes. We improve the bounds 
from [16]. 

For the variable-sized packing, the method used in this paper generalizes and com- 
bines the methods used in [7] and [14]. The are some new ingredients that we use in this 
paper. An important difference with the previous papers is that in the current problem, it 
is no longer immediately clear which bin size should be used for any given item. In [7] all 
bins have the same size. In [14] bins have one dimension and therefore a simple greedy 
choice gives the best option. In our case, all the dimensions in which an item is “large” 
must be taken into account, and the process of choosing a bin is done by considering all 
these dimensions. This in turn changes the analysis, as not only the packing could have 
been done in a different way by an offline algorithm, but also items could have been 
assigned to totally different bins (larger in some dimensions and smaller in others). 

2 An Optimal Algorithm for Bounded Space Variable-Sized 
Packing 

In this section we consider the problem of multidimensional packing where the bins used 
can have different sizes. We assume that all bins are hypercubes, with sides a\ < < 

. . . < am = 1 . In fact our algorithm is more general and works for the case where the 
bins are hyperboxes with dimensions aij (i = 1, . . . , m, j = 1, . . . , d). We present the 
special case of bins that are hypercubes in this paper in order to avoid an overburdened 
notation and messy technical details. 

The main structure of the algorithm is identical to the one in [7]. The main problem 
in adapting that algorithm to the current problem is selecting the right bin size to pack 
the items in. In the one-dimensional variable-sized bin packing problem, it is easy to see 
which bin will accommodate any given item the best; here it is not so obvious how to 
select the right bin size, since in one dimension a bin of a certain size might seem best 
whereas for other dimensions, other bins seem more appropriate. 



290 



L. Epstein and R. van Stee 



We begin by defining types for hyperboxes based on their components and the avail- 
able bin sizes. The algorithm uses a parameter e. The value of M as a function of e is 
picked so that M > 1/(1 — (1 — — 1. An arriving hyperbox h of dimensions 

{hi, /i 2 , ■ ■ ■ , is classified as one of at most {2mM/ai — 1)^ types depending on its 
components; a type of a hyperbox is the vector of the types of its components. We define 



T,; = <! y 



^ ’ j - 2M J ’ 



T = U T,. 

i=l 



Let the members of T be 1 = > t 2 >■■■> tq> = a\/M > . . . > tq = aij{2M). 

The interval Ij is defined to be {tj+i,tj] for j = 1, . . . ,q' . Note that these intervals are 
disjoint and that they cover {ai/M, 1]. 

A component larger than ai /M has type i if hi € li, and is called large. A component 
smaller than a\ /M has type i, where q' < i < q—f, if there exists a non-negative integer 
fi such that L+i < 2^' hi < U. Such components are called small. Thus in total there 
are q — 1 < 2mM/ai — 1 component types. 



Bin selection. We now describe how to select a bin for a given type. Intuitively, the size 
of this bin is chosen in order to maximize the number of items packed relative to the 
area used. This is done as follows. 

For a given component type Si and bin size aj, write F{si, aj) = ma,x{k \ aj/k > 
ts-}. Thus for a large component, F{si, aj) is the number of times that a component 
of type Si fits in an interval of length aj . This number is uniquely defined due to the 
definition of the numbers L. Basically, the general classification into types is too fine for 
any particular bin size, and we use F{si,aj) to get a less refined classification which 
only considers the points L of the form aj/k. 

Denote by L the set of components in type s = (si , . . . , Sd) that are large. If L = 0, 
we use a bin of size 1 for this type. Otherwise, we place this type in a bin of any size aj 
which maximizes' 

F{si,aj) 
aj 

Thus we do not take small components into account in this formula. Note that for a small 
component, F{si,aj) is not necessarily the same as the number of times that such a 
component fits into any interval of length aj . However, it is at least M for any small 
component. 

When such a bin is opened, it is split into identical sub-bins of 

dimensions {aj/F{s\,aj), . . . ,aj /F{sd, ctj)). These bins are then further sub-divided 
into sub-bins in order to place hyperboxes in “well-fitting” sub-bins, in the manner which 
is described in [7]. 

Similarly to in that paper, the following claim can now be shown. We omit the proof 
here due to space constraints. 




* For the case that the bins are hyperboxes instead of hypercubes, we here get the formula 
‘^ij)/t^ij)’ ™d similar changes throughout the text. 
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Claim. The occupied volume in each closed hin of type s = (si, . . . , Sd) is at least 



— 



(l-£)ajn 

i^L 



OCj ) 

F{si,aj) + 1 ’ 



where L is the set of large components in this type and ctj is the hin size used to pack 
this type. 



We now define a weighting function for our algorithm. The weight of a hyperhox h 
with components (hi, . . . , hd) and type s = (si, . . . , Sd) is defined as 



.(h) = 



1 — e 



n ^0 ( n 

\i^L 



iGL 



F{s„aj) 



where L is the set of large components in this type and aj is the size of bins used to 
pack this type. 

In order to prove that this weighting function works (gives a valid upper bound 
for the cost of our algorithm), we will want to modify components Si to the smallest 
possible component such that F{si, aj) does not change. (Basically, a component will 
be rounded to aj/{F{si,aj) + 1) plus a small constant.) However, with variable-sized 
bins, when we modify components in this way, the algorithm might decide to pack the 
new hyperhox differently. (Remember that F{si,aj) is a “less refined” classification 
which does not take other bin sizes than aj into account.) To circumvent this technical 
difficulty, we will show first that as long as the algorithm keeps using the same bin size 
for a given item, the volume guarantee still holds. 

For a given type s = (si , . . . , Sd) and the corresponding set L and bin size aj , define 
an extended type Ext(si , . . . , Sd) as follows: an item h is of extended type Ext(si , . . . , Sd) 
if each large component hi G ( _F(s°^^a-) ] and each small component hi is of 
type Si. 



Corollary 1. Suppose items of extended type Ext(si, . . . , Sd) are packed into bins of 
size aj. Then the occupied volume in each closed bin is at least Vsj. 

Proof. In the proof of Claim 1, we only use that each large component hi is contained 
in the interval ( f(s '^ -)+i ^ Thus the proof also works for extended types. 

Lemma 1. For all input sequences a, costaig(cr) < ^e(^) + 0(1)- 

Proof. In order to prove the claim, it is sufficient to show that each closed bin of size 
aj contains items of total weight of at least aj. Consider a bin of this size filled with 
hyperboxes of type s = (si, . . . , Sd). It is sufficient to consider the subsequence cr of 
the input that contains only items of this type, since all types are packed independently. 
This subsequence only uses bins of size aj so we may assume that no other sizes of bins 
are given. We build an input cr' for which both the behavior of the algorithm and the 
weights are the same as for cr, and show the claim holds for a' . Let <5 < 1/M^ be a very 
small constant. 
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For a hyperbox h G <t with components {hi, . . . , hd) and type s = (si, . . . , Sd), 
let h' = {h[, . . . , h'j) G a' be defined as follows. For i ^ L, h{ = hi. For i G L, 
K = ctj/{F{si,aj) + 1) + 5 < aj/F{si,aj). 

Note that h' is of extended type Ext(si, . . . , Sd). Since only one size of bin is given, 
the algorithm packs a' in the same way as it packs cr. Moreover, according to the definition 
of weight above, h and h! have the same weight. 

Let v{K) denote the volume of an item h. For h G a,we compute the ratio of weight 
and volume of the item h' . We have 



We{h') _ We{h) 

v{h') v{h') 





1 

1 — e 



n 



> 



1 



F{si,aj)h{ l-e 



n 

ieL 



F{si , G-j ) “t“ 1 
F{si,aj) +M^5' 



Flere we have used in the last step that a component with a large type fits less than M 
times in a (one-dimensional) bin of size «i, and therefore less than times in a bin 
of size Gj > «i. As i5 tends to zero, this bound approaches a^/14 j. We find 



Wg{h) > for all h G a. 

Then Corollary 1 implies that the total weight of items in a closed bin of size Gj is no 
smaller than which is the cost of such a bin. 

Suppose the optimal solution for a given input sequence a uses rij bins of size Gj . 
Denote the *th bin of size gj by Bi j. Then 



E,g. ET=1 nil E.gB.,, y^eih) ET=1 EZl E 



h^Bi 



We{h) 



E m 
,1=1 



j=l 



E m 

i— 






E m 

7 = 1 2-^i— 



1 



This implies that the asymptotic worst case ratio is upper bounded by 



max max Wg{h)/a^, (2) 

j Xn 

^ h£Xj 



where the second maximum is taken over all sets Xj that can be packed in a bin of size 
Gj. Similarly to in [7], it can now be shown that this weighting function is also “optimal" 
in that it determines the true asymptotic performance ratio of our algorithm. 

In particular, it can be shown that packing a set of hyperboxes X that have the same 
type vectors of large and small dimensions takes at least 

h&Xi^L ^ / ieL 

bins of size Gj, where hi is the *th component of hyperbox h, Si is the type of the 1th 
component, and L is the set of large components (for all the hyperboxes in X). Since 
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the cost of such a bin is this means that the total cost to pack N' copies of some 
item h is at least N'we{h){l — er) when bins of this size are used. However, it is clear 
that using bins of another size ak does not help: packing N' copies of h into such bins 
would give a total cost of 



N' 




Ctfc 



F{s^,ak) 



Since aj was chosen to maximize ctj)/c(j)^ this expression cannot be less 

than N'w,;{h){l — e). More precisely, any bins that are not of size aj can be replaced 
by the appropriate number of bins of size aj without increasing the total cost by more 
than 1 (it can increase by 1 due to rounding). 

This implies that our algorithm is optimal among online bounded space algorithms. 



3 An Optimal Algorithm for Bounded Space Resource Augmented 
Packing 



The resource augmented problem is now relatively simple to solve. In this case, the online 
algorithm has bins to its disposal that are hypercubes of dimensions &i x 62 x ■ • ■ x We 
can use the algorithm from [7] with the following modification: the types for dimension 
7 are not based on intervals of the form {\ / ii + 1) A / i] but rather intervals of the form 

{bj/{i + l),bj/i\. 

Then, to pack items of type s = (si, . . . , Sd), a bin is split into 0*=! identical 
sub-bins of dimensions (61 /si, . . . , bd/sd), and then subdivided further as necessary. 
We now find that each closed bin of type s = (si, . . . , s^) is full by at least 



(i-£)i?n 

ieL 



Sj 

Sj + 1 



where L is the set of large components in this type, and B = Y\‘j=i i* volume of 
the online bins. 

We now define the weight of a hyperbox h with components {hi, . . . , hd) and type 
s= (si,...,Sd) as 



u>e{h) 




where L is the set of large components in this type. 

This can be shown to be valid similarly to before, and it can also be shown that items 
can not be packed better. However, in this case we are additionally able to give explicit 
bounds for the asymptotic performance ratio. 



294 



L. Epstein and R. van Stee 



3.1 The Asymptotic Performance Ratio 

Csirik and Woeginger [6] showed the following for the one-dimensional case. 

For a given bin size b, define an infinite sequence T{b) = . . .} of positive 

integers as follows: 

= [1 -f b\ and n = \- 

U tl 

and for i = 1, 2, . . . 

ti+i = [1 -h — J and = Ti- 

n 

Define 

OO ^ 

'>(*>) = 

1 — 1 

Lemma 2. For every bin size 6 > 1, there exist online bounded space bin packing 
algorithms with worst case performance arbitrarily close to p{b). For every bin size 
b > 1, the bound p{b) cannot be beaten by any online bounded space bin packing 
algorithm. 

The following lemma is proved in Csirik and Van Vliet [5] for a specific weight- 
ing function which is independent of the dimension, and is similar to a result of Li 
and Cheng [12], However, the proof holds for any positive one-dimensional weighting 
function w. We extend it for the case where the weighting function depends on the di- 
mension. For a one-dimensional weighting function Wj and an input sequence cr, define 
Furthermore define Wj = sup^,. Wj(a), where the supremum is 
taken over all sequences that can be packed into a one-dimensional bin. 

Lemma 3. Let a be a list of d-dimensional rectangles, and let Q be a packing which 
packs these rectangles into a d-dimensional unit cube. For each h G a, we define a new 
hyperbox h' as follows: Sjfh') = Wj{sj{h)) for 1 < j < d. Denote the resulting list 
of hyperboxes by a' . Then there exists a packing Q' which packs a' into a cube of size 

{w[,...,Wd). 

Proof. We use a construction analogous to the one in [5]. We transform the packing 
Q = (5° of CT into a packing of a' in a cube of the desired dimensions. This is done 
in d steps, one for each dimension. Denote the coordinates of item h in packing Q® by 
{x\{h), . . . , and its dimensions by {s\{h), . . . , s^(d)). 

In step i, the coordinates as well as the sizes in dimension i are adjusted as follows. 
First we adjust the sizes and set s®(d) = Wi{si{h)) for every item h, leaving other 
dimensions unchanged. 

To adjust coordinates, for each item h in packing we find the “left- touching" 
items, which is the set of items g which overlap with h in d — 1 dimensions, and for 
which xl~^{g) -I- sl~^{g) = xl~^{h). We may assume that for each item h, there is 
either a left-touching item or xl~^{h) = 0. 
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Then, for each item h that has no left-touching items, we set x\{h) = 0. For all other 
items h, starting with the ones with smallest i-coordinate, we make the i-coordinate 
equal to max(a;*(g) + s\{g)), where the maximum is taken over the left-touching items 
of h in packing Note that we use the new coordinates and sizes of left-touching 
items in this construction, and that this creates a packing without overlap. 

If in any step i the items need more than Wi room, this implies a chain of left-touching 
items with total size less than 1 but total weight more than Wi. From this we can find a 
set of one-dimensional items that fit in a bin but have total weight more than Wi (using 
weighting function Wi), which is a contradiction. 

As in [5], this implies immediately that the total weight that can be packed into a 
unit-sized bin is upper bounded by nf=i which in the present case is Yi'i=i 
Moreover, by extending the lower bound from [6] to d dimensions exactly as in [5], 
it can be seen that the asymptotic performance ratio of any online bounded space bin 
packing algorithm can also not be lower than Hi=i 

4 Lower Bound for Unbounded Space Hypercube Packing 

We construct a sequence of items to prove a general lower bound for hypercube packing 
in d dimensions. In this problem, all items to be packed are hypercubes. Take an integer 
f > 1. Let e > 0 be a number smaller than 1/2^ — 1/(2^ -f 1). The input sequence is 
defined as follows. We use a large integer N . Let 

Xi = (2^+1-* - 1)'^ - (2^+i-‘ i 

xo = 2“ - (2^ - 1)'^ 

In step 0, we let Nxo items of size sq = (1 -f e)/ (2^ + 1) arrive. In step f = 1, . . . , f, 
we let Nxi items of size Si = (1 -f £r)/2^+^“® arrive. For i = 1, . . . , f — 1, item size 
Si in this sequence divides all the item sizes s^+i , . . . ,Si. Furthermore, ~ 

(1 -I- e)(l — 1/2^ -I- 1/(2^ -I- 1)) < 1 by our choice of e. 

The online algorithm receives steps 0, . . . , fc of this input sequence for some (un- 
known) 0 < k < i. 

A pattern is a multiset of items that fits (in some way) in a unit bin. A pattern is 
dominant if when we increase the number of items of the smallest size in that pattern, 
the resulting multiset no longer fits in a unit bin. A pattern is greedy if the largest item 
in it appears as many times as it can fit in a bin, and this also holds for all smaller items, 
each time taking the larger items that are in the pattern into account. Note that not all 
possible sizes of items need to be present in the bin. 

For a pattern P of items that all have sizes in {sq, . . . , s^}, denote the number of 
items of size Si by Pi. 

Lemma 4. For any pattern P for items with sizes m {sq, . . . , s^}, 

t (. 

j=i+l 3 = 1 



( 3 ) 
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Proof. Suppose (3) does not hold for some i, and consider the smallest i for which it does 
not hold. Consider an item of size Sj with j > i that appears in P. If there is no such j , we 
have a contradiction, since at most — 1)'^ items of size si fit in the unit hypercuhe 

for z = 1, or at most 2^'^ items of size sq- This follows from Claim 4 in [7]. 

First suppose i > 0. If we replace an item of size Sj with items of size Si, 

the resulting pattern is still feasible: all the new size Si items can he placed inside the 
hypercube that this size sj item has vacated. 

We can do this for all items of size Sj, j > i that appear in the pattern. This results 
in a pattern with only items of size Si or smaller. Since every size sj item is replaced 
by items of size Si, the final pattern has more than — 1)'^ items of size 

Si, a contradiction. 

Now suppose i = 0. In this case [(2^ + l)/2^+^“-^J = 2^~^ items of size sq fit in 
a hypercuhe of size Sj , and the proof continues analogously. 

We define a canonical packing for dominant patterns. We create a grid in the bin 
as follows. One corner of the bln is designated as the origin O. We assign a coordinate 
system to the bin, where each positive axis is along some edge of the bin. The grid points 
are those points inside the bin that have all coordinates of the form mi(l + e) /2”^2 for 
mi, m2 G N U {0}. 

We pack the items in order of decreasing size. Each item of size Si (i = 
is placed at the available grid point that has all coordinates smaller than 1 — s^, all 
coordinates equal to a multiple of Si and is closest to the origin. So the first item is 
placed in the origin. Each item of size sq is placed at the available grid point that has all 
coordinates equal to a multiple of si (and not sq) and is closest to the origin. Note that for 
these items we also use grid points with some coordinates equal to (2^ — 1)(1 + e)/2^, 
unlike for items of size si. This is feasible because (2^ — l)(l+e) /2^+(l+e) / (2^+1) = 
(l + e)(l- 1/2^ + 1/(2^ + !)) <1. 

In each step i we can place a number of items which is equal to the upper bound In 
Lemma 4. For i = 1, ... ,i, this is because a larger item of size Sj takes away exactly 
20 -*)rf grid points that are multiples of Si, and originally there are — 1)^ grid 

points of this form that have all coordinates smaller than 1 — s^. For i = 0, grid 

points that are multiples of si are taken away by an item of size sj, from an original 
supply of 2^'^. This shows that all patterns can indeed be packed in canonical form. 

Lemma 5. For this set of item sizes, any dominant pattern that is not greedy is a convex 
combination of dominant patterns that are greedy. 

Proof. We use induction to construct a convex combination of greedy patterns for a given 
dominant pattern P. The induction hypothesis is as follows: the vector that describes 
the numbers of items of the t smallest types which appear in the pattern is a convex 
combination of greedy vectors for these types. Call such a pattern f-greedy. 

The base case is f = 1. We consider the items of the smallest type that occurs in P. 
Since P is dominant, for this type we have that as many items as possible appear in P, 
given the larger items. Thus P is 1-greedy. 

We now prove the induction step. Suppose that in P, items of type i appear fewer 
times than they could, given the larger items. Moreover, P contains items of some smaller 
type. Let i' be the largest smaller type in P. By induction, we only need to consider 
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patterns in which all the items of type less than i that appear, appear as many times as 
possible, starting with items of type i' . (All other patterns are convex combinations of 
such patterns.) 

We define two patterns P' and P” such that P is a convex combination of them. First 
suppose i' > 0. P' is defined as follows: modify P by removing all items i and adding 
the largest smaller item that appears in P, of type i', 2^®“® times per each item i. When 
creating P', we thus add the maximum amount of items of type i' that can fit for each 
removed item of type i. P is greedy with respect to all smaller items, and Si' divides Si. 
Therefore the multiset P' defined in this way is a pattern, and is (f + 1) -greedy. 

P" on the other hand is created by adding items of phase i to P and removing items 
of type i' . In particular, in the canonical packing for P, at each grid point for type i that 
is not removed due to a higher-type item, we place an item of size Si and remove all 
items that overlap with this item. Since all items smaller than Si appear as many times 
as possible given the larger items, all the removed items are of the next smaller type i' 
that appear in P. This holds because the items are packed in order of decreasing size, so 
this area will certainly be filled with items of type i' if the item of size i is not there. 

In P" , the number of items of type i is now maximized given items of higher types. 
Only type i' items are removed, and only enough to make room for type i, so type i' 
remains greedy. Thus P" is (f -f 1) -greedy. Each time that we add an item i, we remove 
exactly items of type i' . So by adding an item i in creating P", we remove 

exactly the same number of items of type %' as we add when we remove an item i while 
creating P'. Therefore, P is a convex combination of P' and P", and we are done. 

Now suppose i' = 0. (This is a special case of the induction step for t -f 1 = 2.) In 
this case, in P' each item of type i is replaced by 2*^*“^^^^ items of type 0. In the canonical 
packing, this is exactly the number of type 1 grid points that become available when 
removing an item of type i. Thus in P', the number of type 0 items is maximal (since it 
is sufficient to use type 1 grid points for them), and P' is 2-greedy. 

Similarly, it can be seen that to create P", we need to remove 2*^®“^^'^ items of type 
0 in the canonical packing in order to place each item of type i. Then P" is 2-greedy, 
and again P is a convex combination of P' and P". 



We can now formulate a linear program to lower bound the asymptotic performance 
ratio of any unbounded space online algorithm as in [16]. We will need the offline cost 
to pack any prefix of the full sequence. This is calculated as follows. 

To pack the items of the largest type k, which have size (1 + we need 

Nxk/{2.^^^~^ — 1)®* bins because there are Nxk such items. These are all packed 
identically: using a canonical packing, we pack as many items of smaller types in bins 
with these items as possible. (Thus we use a greedy pattern.) Some items of types 
1, . . . , fc — 1 still remain to be packed. It is straightforward to calculate the number of 
items of type k — \ that still need to be packed, and how many bins this takes. We 
continue in the same manner until all items are packed. 

Solving this linear program for ^ = 11 and several values of d gives us the following 
results. The first row contains the previously known lower bounds. 
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d 


1 


2 3 4 


5 6 7 8 9 10 


old 


1.5401 1.6217 1.60185 1.5569 


(Bounds here were below 1.5569) 


new 


(1.5) 


1.6406 1.6680 1.6775 1.6840 1.6887 1.6920 1.6943 1.6959 1.6973 
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Abstract. We prove that all planar Laman graphs (i.e. minimally gener- 
ically rigid graphs with a non-crossing planar embedding) can be gener- 
ated from a single edge by a sequence of vertex splits. It has been shown 
recently [6,12] that a graph has a pointed pseudo-triangular embedding 
if and only if it is a planar Laman graph. Due to this connection, our 
result gives a new tool for attacking problems in the area of pseudo- 
triangulations and related geometric objects. One advantage of vertex 
splitting over alternate constructions, such as edge-splitting, is that ver- 
tex splitting is geometrically more local. 

We also give new inductive constructions for duals of planar Laman 
graphs and for planar generically rigid graphs containing a unique rigidity 
circuit. Our constructions can be found in 0(n®) time, which matches 
the best running time bound that has been achieved for other inductive 
contructions. 

1 Introduction 

The characterization of graphs for rigidity circuits in the plane and isostatic 
graphs in the plane has received significant attention in the last few years [2, 
3,8,9,14]. The special case of planar graphs, and their non-crossing realizations 
has been a particular focus, in part because special inductive constructions apply 
[3] , and in part because of special geometric realizations as pseudo-triangulations 
and related geometric objects [6,11,12,13]. 

In [3] it was observed that all 3-connected planar rigidity circuits can be gen- 
erated from Ki by a sequence of vertex splits, preserving planarity, 3-connectivity 
and the circuit property in every intermediate step, a result that follows by du- 
ality from the construction of 3-connected planar rigidity circuits by edge splits 
[2]. We extend this result in two ways. We show that all planar Laman graphs 
(bases for the rigidity matroid) can be generated from K 2 (a single edge) by a 

^ Supported by the MTA-ELTE Egervary Research Group on Combinatorial Opti- 
mization, and the Hungarian Scientific Research Fund grant no. F034930, T037547, 
and FKFP grant no. 0143/2001. 
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sequence of vertex splits, and all planar generically rigid graphs (spanning sets of 
the rigidity matroid) containing a unique rigidity circuit can be generated from 
K 4 (a complete graph on four vertices) by a sequence of vertex splits. 

One advantage of vertex splitting over alternate constructions, such as edge- 
splitting, is that vertex splitting is geometrically more local. With very local 
moves, one can better ensure the planarity of a class of realizations of the re- 
sulting graph. This feature can be applied to give an alternate proof that each 
planar Laman graph can be realized as a pointed pseudo-triangulation, and that 
each planar generically rigid graph with a unique rigidity circuit can be realized 
as a pseudo-triangulation with a single non-pointed vertex. 

A graph G = {V, E) is a Laman graph if \V\ > 2, \E\ = 2\V\ — 3, and 

i{X)<2\X\-2, (1) 

holds for all X V with |A| > 2, where i{X) denotes the number of edges 
induced by X. Laman graphs, also known as isostatic or generically minimally 
rigid graphs, play a key role in 2-dimensional rigidity, see [5,7,8,10,14,16]. By 
Laman’s Theorem [9] a graph embedded on a generic set of points in the plane is 
infinitesimally rigid if and only if it is Laman. Laman graphs correspond to bases 
of the 2-dimensional rigidity matroid [16] and occur in a number of geometric 
problems (e.g. unique realizability [8], straightening polygonal linkages [12], etc.) 

Laman graphs have a well-known inductive construction, called Henneberg 
construction. Starting from an edge (a Laman graph on two vertices), construct 
a graph by adding new vertices one by one, by using one of the following two 
operations: 

(i) add a new vertex and connect it to two distinct old vertices via two new 
edges {vertex addition) 

(ii) remove an old edge, add a new vertex, and connect it to the end vertices 
of the removed edge and to a third old vertex which is not incident with the 
removed edge {edge splitting) 

It is easy to check that a graph constructed by these operations is Laman. 
The more difficult part is to show that every Laman graph can be obtained this 
way. 

Theorem 1. [14,8] . A graph is Laman if and only if it has a Henneberg con- 
struction. 

An embedding G{P) of the graph G on a set of points P = {pi, ...,p„} C is 
a mapping of vertices Vi € V to points pi € P in the Euclidean plane. The edges 
are mapped to straight line segments. Vertex Vi of the embedding G{P) is pointed 
if all its adjacent edges lie on one side of some line through pi. An embedding 
G{P) is non-crossing if no pair of segments, corresponding to independent edges 
of G, have a point in common, and segments corresponding to adjacent edges 
have only their endvertices in common. A graph G is planar if it has a non- 
crossing embedding. By a plane graph we mean a planar graph together with a 
non-crossing embedding. 
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A pseudo-triangle is a simple planar polygon with exactly three convex ver- 
tices. A pseudo-triangulation of a planar set of points P is a non-crossing em- 
bedded graph G{P) whose outer face is convex and all interior faces are pseudo- 
triangles. In a pointed pseudo-triangulation all vertices are pointed. A pointed- 
plus-one pseudo-triangulation is a pseudo-triangulation with exactly one non- 
pointed vertex. The following result, due to Streinu, generated several exciting 
questions in computational geometry. 

Theorem 2. [12] Let G be embedded on the set P = {pi, ...,p„} of points. If G 
is a pointed pseudo-triangulation of P then G is a planar Laman graph. 

One of the first questions has been answered by the next theorem. The proof 
used the following topological version of the Henneberg construction, which is 
easy to deduce from Theorem 1. 

Lemma 1. Every plane Laman graph has a plane Henneberg construction, 
where all intermediate graphs are plane, and at each step the topology is changed 
only on edges and faces involved in the Henneberg step. In addition, if the outer 
face of the plane graph is a triangle, there is a Henneberg construction starting 
from that triangle. 



Theorem 3. [6] Every planar Laman graph can be embedded as a pointed 
pseudo-triangulation. 




Fig. 1. The plane vertex splitting operation applied to a plane Laman graph on edge 
xy with partition Ei = {xui,xu 2 },E 2 = {xus}. 



Our main theorem is a different inductive construction for plane Laman 
graphs. It can also be used to prove Theorem 3 and it might be more suit- 
able than the Henneberg construction for certain geometric problems. The con- 
struction uses (the topological version of) vertex splittings, called plane vertex 
splitting. This operation picks an edge xy in a plane graph, partitions the edges 
incident to x (except xy) into two consecutive sets ifi, i ?2 of edges (with respect 
to the natural cyclic ordering given by the embedding), replaces x by two ver- 
tices X\,X 2 , attaches the edges in Ei to xi, attaches the edges in E 2 to X 2 , and 
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adds the edges yx\,yx 2 ,xiX 2 , see Figure 1. It is easy to see that plane vertex 
splitting, when applied to a plane Laman graph, yields a plane Laman graph. 
Note that the standard version of vertex splitting (see [15,16] for its applications 
in rigidity theory), where the partition of the edges incident to x is arbitrary, 
preserves the property of being Laman, but may destroy planarity. 

We shall prove that every plane Laman graph can be obtained from an edge 
by plane vertex splittings. To prove this we need to show that the inverse opera- 
tion of plane vertex splitting can be performed on every plane Laman graph on 
at least three vertices in such a way that the graph remains (plane and) Laman. 
The inverse operation contracts an edge of a triangle face. 

Let e = uv he an edge of G. The operation contracting the edge e identifies 
the two end- vertices of e and deletes the resulting loop as well as one edge from 
each of the resulting pairs of parallel edges, if there exist any. The graph obtained 
from G by contracting e is denoted by G/e. We say that e is contractible in a 
Laman graph G if G/e is also Laman. Observe that by contracting an edge e 
the number of vertices is decreased by one, and the number of edges is decreased 
by the number of triangles that contain e plus one. Thus a contractible edge 
belongs to exactly one triangle of G. 

Laman graphs in general do not necessarily contain triangles, see e.g. K 3 3 . 
For plane Laman graphs we can use Euler’s formula to deduce the following. 

Lemma 2. Every plane Laman graph G = (V, E) with |E| > 4 contains at least 
two triangle faces (with distinct boundaries). 

It is easy to observe that an edge of a triangle face of a plane Laman graph 
is not necessarily contractible. In addition, a triangle face may contain no con- 
tractible edges at all. See Figure 2 and Figure 3 for examples. This is one reason 
why the proof of the inductive construction via vertex splitting is more difficult 
than that of Theorem 1, where the corresponding inverse operations of vertex 
addition or edge splitting can be performed at every vertex of degree two or 
three. 




Fig. 2. A Laman graph G and a non-contractible edge ab on a triangle face abc. The 
graph obtained by contracting ab satisfies (1), but it has less edges than it should have. 
No edge on abc is contractible, but edges ad and cd are contractible in G. 
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Fig. 3. A Laman graph G and a non-contractible edge uv on a triangle face uvw. The 
graph obtained by contracting uv has the right number of edges but it violates (1). 



The paper is organised as follows. In Section 2 we prove some basic facts 
about Laman graphs. In Section 3 we turn to plane Laman graphs and complete 
the proof of our main result. Sections 4 and 5 contain some corollaries concerning 
pseudo-triangulations and further inductive constructions obtained by planar 
duality. Algorithmic issues are discussed in Section 6. 

2 Preliminary Results on Laman Graphs 

In this section we introduce the basic definitions and notation, and prove some 
preliminary lemmas on (not necessarily planar) Laman graphs. 

Let G = (V, E) be a graph. For a subset X CV lei G[X] denote the subgraph 
of G induced by X. For a pair X, F C V let da{X,Y) denote the number of 
edges from X — Y to Y — X. We omit the subscript if the graph is clear from 
the context. The following equality is well-known. It is easy to check by counting 
the contribution of an edge to each of the two sides. 

Lemma 3. Let G = {V,E) be a graph and let X,Y GV. Then 

i{X) + i{Y) + d{X, Y) = i{X U F) -h i{X n F). 

Let G = {V,E) be a Laman graph. It is easy to see that Laman graphs 
are simple (i.e they contain no multiple edges) and 2-vertex-connected. A set 
X C F is critical in G if i{X) = 2|X| — 3, that is, if X satisfies (1) with equality. 
Equivalently, X is critical if and only if G[X] is Laman. Thus V as well as the 
set {u, v} for all uv £ E are critical. The next lemma is easy to deduce from 
Lemma 3. 

Lemma 4. Let G = (V, E) he a Laman graph and let X^Y GV be critical sets 
in G with \X G\Y\ > 2. Then X (lY andXLlY are also critical and d{X,Y) = 0 
holds. 

Lemma 5. Let G = (V,E) be a Laman graph and let X G V be a critical set. 
Let C be the union of some of the connected components of G — X . Then X UG 
is critical. 
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Proof. Let Ci, C2, Cfc be the connected components of G — X and let Xi = 
X U Ci, for 1 < i < fc. We have Xi Xj = X and d{Xi,Xj) = 0 for all 
1 < * < j < fc, and Ui^iXi = V. Since G is Laman and X is critical, we can 
count as follows: 2|tZ|— 3 = \E\ = i{Xi\JX 2 f>...f>Xk) = i{Xi) — {k — l)i{X) < 

J2i{2\X,\-3)-{k-l){2\X\-3) =2J2i |X,|+2(A:-1)|X|-3A:+3(A:-1) = 2|F|-3. 
Thus equality must hold everywhere, and hence each Xi is critical. 

Now Lemma 4 (and the fact that \X\ > 2) implies that if G is the union of 
some of the components oi G — X then X U C is critical. □ 

The next lemma characterises the contractible edges in a Laman graph. 

Lemma 6. Let G = (V,E) be a Laman graph and let e = uv £ E. Then e is 
eontraetible if and only if there is a unique triangle uvw in G eontaining e and 
there exists no eritical set X in G with u,v £ X, w ^ X, and |X| > 4. 

Proof. First suppose that e is contractible. Then G/e is Laman, and, as we noted 
earlier, e must belong to a unique triangle uvw. For a contradiction suppose that 
X is a critical set with u,v £ X, w ^ X, and |X| > 4. Then e is an edge of 
G[X] but it does not belong to any triangle in G[X]. Hence by contracting e we 
decrease the number of vertices and edges in G[X] by one. This would make the 
vertex set of G[X]/e violate (1) in G/e. Thus such a critical set cannot exist. 

To see the ‘if’ direction suppose that there is a unique triangle uvw in G 
containing e and there exists no critical set X in G with u,v £ X, w ^ X, and 
|X| > 4. For a contradiction suppose that G' := G/e is not Laman. Let v' denote 
the vertex of G' obtained by contracting e. Since G is Laman and e belongs to 
exactly one triangle in G, it follows that |£’(G')| = 2\V{G') \ — 3, so there is a 
set Y C V{G') with |F| > 2 and iG’{Y) > 2|F| — 2. Since G' is simple and uv 
belongs to a unique triangle in G, it follows that V{G'), all two-element subsets 
of V{G'), all subsets containing v' and w, as well as all subsets not containing 
v' satisfy (1) in G'. Thus we must have |y| > 3, v' £ Y and w ^ Y. Hence 
X := {Y — v') U {u, u} is a critical set in G with u,v £ X, w ^ X, and |X| > 4, 
a contradiction. This completes the proof of the lemma. □ 

Thus two kinds of substructures can make an edge e = uu of a triangle uvw 
non-contractible: a triangle uvw' with w' ^ w and a critical set X with u,v £ X, 
w ^ X and |X| > 4. Since a triangle is also critical, these substructures can be 
treated simultaneously. We say that a critical set X C 14 is a blocker of edge 
e = uv (with respect to the triangle uvw) if u,v £ X, w ^ X and |X| > 3. 

Lemma 7. Let uvw be a triangle in a Laman graph G = (V, E) and suppose that 
e = uv is non-contractible. Then there exists a unique maximal blocker X of e 
with respect to uvw. Furthermore, G — X has precisely one connected component. 

Proof. There is a blocker of e with respect to uvw by Lemma 6. By Lemma 4 the 
union of two blockers of e, with respect to uvw, is also a blocker with respect to 
uvw. This proves the first assertion. The second one follows from Lemma 5: let G 
be the union of those components of G—X that do not contain w, where X is the 
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maximal blocker of e with respect to uvw. Since A1 U C is critical, and does not 
contain w, it is also a blocker of e with respect to uvw. By the maximality of X we 
must have C = 0. Thus G — X has only one component (which contains w). □ 

Since a blocker A1 is a critical set in G, G[X] is also Laman. 

Lemma 8. Let G = (V,E) be a Laman graph, let uvw be a triangle, and let 
f = uv be a non- contractible edge. Let X be the maximal blocker of f with 
respect to uvw. Lf f is contractible in G[X] then it is contractible in G. 

Proof. Let e = rz. Since e is contractible in G[A'], there exists a unique triangle 
rzy in G[X] which contains e. For a contradiction suppose that e is not con- 
tractible in G. Then by Lemma 6 there exists a blocker of e with respect to rzy, 
that is, a critical set Z C V with r, z G Z, y ^ Z, and \Z\ >3. Lemma 4 implies 
that Z r\ X is critical. If \Z (1 X\ >3 then Z fl X is a blocker of e in G[X], 
contradicting the fact that e is contractible in G[X]. 

Thus Z C\ X = {r, z}. We claim that w ^ Z. To see this suppose that w G Z 
holds. Then w G Z — X . Since e /, and |Z fl X| =2, at least one of u, v is not 
in Z. But this would imply d(X, Z) > 1, contradicting Lemma 4. This proves 
w ^ Z. 

Clearly, Z — X 0. Thus, since ZUX is critical by Lemma 4, it follows that 
Z U X is a blocker of / in G with respect to uvw, contradicting the maximality 
of X. This proves the lemma. □ 



3 Plane Laman Graphs 

Lemma 9. Let G = (V, E) be a plane Laman graph, let uvw be a triangle face, 
and let f = uv be a non- contractible edge. Let X be the maximal blocker of f 
with respect to uvw. Then all but one faces of G[X] are faces of G. 

Proof. Consider the faces of G[X] and the connected component G of G — X, 
which is unique by Lemma 7. Clearly, G is within one of the faces of G[X]. Thus 
all faces, except the one which has w in its interior, is a face of G, too. □ 

The exceptional face of G[X] (which is not a face of G) is called the special 
face of G[X]. Since the special face has w in its interior, and uvw is a triangle 
face in G, it follows that the edge uv is on the boundary of the special face. If 
the special face of G[X] is a triangle uvq, then the third vertex q of this face 
is called the special vertex of G[X]. If the special face of G[X] is not a triangle, 
then X is a nice blocker. We say that an edge e is face contractible in a plane 
Laman graph if e is contractible and the triangle containing e (which is unique 
by Lemma 6) is a face in the given embedding. 

We are ready to prove our main result on the existence of a face contractible 
edge. In fact, we shall prove a somewhat stronger result which will also be useful 
when we prove some extensions later. 
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Theorem 4. Let G = (V,E) be a plane Laman graph with \V\ > 4. Then 

(i) if uvw is a triangle face, f = uv is not contractible, and X is the maximal 
blocker of f with respect to uvw, then there is an edge in G[X] which is face 
contractible in G, 

(ii) for each vertex r £ V there exist at least two face contractible edges which 
are not incident with r. 

Proof. The proof is by induction on \V\. It is easy to check that the theorem 
holds if |t^| =4 (in this case G is unique and has essentially one possible planar 
embedding). So let us suppose that |F| > 5 and the theorem holds for graphs 
with less than \V\ vertices. 

First we prove (i). Consider a triangle face uvw for which f = uv is not 
contractible, and let X be the maximal blocker of / with respect to uvw. Since 
X is a critical set, the induced subgraph G[X] is Laman. Together with the 
embedding obtained by restricting the embedding of G to the vertices and edges 
of its subgraph induced by X, the graph G[X] is a plane Laman graph. Since 
w ^ X, G[X] has less than \V\ vertices. 

We call an edge e of G[X] proper if e yf /, e is face contractible in G[X], and 
the triangle face of G\X] containing e is a face of G as well. It follows from the 
definition and Lemma 8 that a proper edge e is face contractible in G as well. 
We shall prove (i) by showing that there is a proper edge in G[X\. 

To this end first suppose that |X | = 3. Then G[X] is a triangle, and each of 
its edges is contractible in G[X]. By Lemma 9 one of the two faces of G[X] is a 
face of G as well. Thus each of the two edges of G[X] which are different from 
/, is proper. 

Next suppose that |X| > 4. By the induction hypothesis (ii) holds for G[X] 
by choosing r = u. Thus there exist two face contractible edges e',e" in G[X] 
which are not incident with u (and hence e' and e" must be different from /). If 

is a nice blocker then the triangle face containing e' (or e") in G\X] is a face 
of G as well, by Lemma 9. Thus e' (or e") is proper. 

If X is not a nice blocker then it has a special triangle face uvq, which is 
not a face of G, and each of the other faces of G[X] is a face of G by Lemma 9. 
Since e' and e" are distinct edges which are not incident with u, at least one of 
them, say e', is not an edge of the triangle uvq. Hence the triangle face of G[X] 
containing e' is a triangle face of G as well. Thus e' is proper. This completes 
the proof of (i). 

It remains to prove (ii). To this end let us fix a vertex r G V. We have two 
cases to consider. 

Case 1. There exists a triangle face uvw in G with r ^ {u, 

If at least two edges on the triangle face uvw are face contractible then we 
are done. Otherwise we have blockers for two or three edges of uvw. 

If none of the edges of the triangle uvw is contractible then there exist max- 
imal blockers X,Y,Z for the edges vw,uw, and uv (with respect to u,v, and 
w), respectively. By Lemma 4 we must have X (lY = {w}, X n Z = {u}, and 
Y nZ = {u} (since the sets are critical and d(Y, Z), d{X, Y), d{X, Z) > 1 by the 
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existence of the edges of the triangle uvw). By our assumption r is not a vertex 
of the triangle uvw. Thus r is contained by at most one of the sets X,Y,Z. 
Without loss of generality, suppose that r ^ XUV. By (i) each of the subgraphs 
G[X],G\Y] contains an edge which is face contractible in G. These edges are 
distinct and avoid r. Thus G has two face contractible edges not containing r, 
as required. 

Now suppose that uv is contractible but vw and uw are not contractible. 
Then we have maximal blockers X,Y for the edges vw,uw, respectively. As 
above, we must have X DY = {w} by Lemma 4. Since r ^ w, we may assume, 
without loss of generality, that r ^ X. Then it follows from (i) that there is an 
edge / in G\X] which is face contractible in G. Thus we have two edges {uv and 
/), not incident with r, which are face contractible in G. 

Case 2. Each of the triangle faces of G contains r. 

Consider a triangle face ruv of G. Then uv is face contractible, for otherwise 
(i) would imply that there is a face contractible edge in G[A1], (in particular, 
there is a triangle face of G which does not contain r), a contradiction. Since G 
has at least two triangle faces by Lemma 2, it follows that G has at least two 
face contractible edges avoiding r. This proves the theorem. □ 

Theorem 4 implies that a plane Laman graph on at least four vertices has a 
face contractible edge. A plane Laman graph on three vertices is a triangle, and 
each of its edges in face contractible. Note that after the contraction of a face 
contractible edge the planar embedding of the resulting graph can be obtained 
by a simple local modification. 

Since contracting an edge of a triangle face is the inverse operation of plane 
vertex splitting, the proof of the following theorem by induction is straight- 
foward. 

Theorem 5. A graph is a plane Laman graph if and only if it can be obtained 
from an edge by plane vertex splitting operations. 

Theorem 4 implies that if G has at least four vertices then there is a face 
contractible edge avoiding any given triangle face. Thus any triangle face can be 
chosen as the starting configuration in Theorem 5. 

We say that G = (V, E) is a rigidity circuit if G — e is Laman for every edge 
e of G. We call it a Laman-plus-one graph if G — e is Laman for some edge e of 
G. From the matroid viewpoint it is clear that Laman-plus-one graphs are those 
rigid graphs (i.e. spanning sets of the rigidity matroid) which contain a unique 
rigidity circuit. In particular, rigidity circuits are Laman-plus-one graphs. 

By using similar techniques we can prove the following. The proof is omitted 
from this version. 

Theorem 6. A graph is a plane Laman-plus-one graph if and only if it can be 
obtained from a K 4 by plane vertex splitting operations. 

Remark A natural question is whether a 3-connected plane Laman graph has 
a face contractible edge whose contraction preserves 3-connectivity as well (call 
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such an edge strongly face contractible). The answer is no: let G be a 3-connected 
plane Laman graph and let G' be obtained from G by inserting a new triangle 
face a'b'cf and adding the edges aa', bb', cc' (operation triangle insertion), for 
each of its triangle faces abc. Then G' is a 3-connected plane Laman graph with 
no strongly face contractible edges. It is an open question whether there exist 
good local reduction steps which could lead to an inductive construction in the 
3-connected case, such that all intermediate graphs are also 3-connected. 

4 Pseudo-Triangulations 

Theorems 5 and 6 can be used to deduce a number of known results on pseudo- 
triangulations. One can give new proofs for Theorem 3 and for the following 
similar result (which was also proved in [6]): every planar Laman-plus-one graph 
can be embedded as a pointed-plus-one pseudo-triangulation. The proofs are 
omitted. They are similar to the proofs in [6] but the lengthy case analyses can 
be shortened, due to the fact that plane vertex splitting is more local. 

Theorems 5 and 6 can also be used to prove the main results of [6] on combi- 
natorial pseudo-triangulations of plane Laman and Laman-plus-one graphs. See 
[6] for more details on this combinatorial setting. 

5 Planar Duals 

Let G = iy,E) be a graph. We say that G is co-Laman if \E\ = 2\V\ — 1 and 
i{X) < 2\X\ — 2 for all proper subsets X C V. Note that co-Laman graphs 
may contain multiple edges. Let M{G) denote the circuit matroid of G and let 
M*{G) denote its dual. 

Theorem 7. Let G and H be a pair of planar graphs with M{G) = M*{E[). 
Then G is Laman if and only if H is co-Laman. 

Proof. (Sketch) Simple matroid facts and Nash- Williams’ characterisation of 
graphs which can be partitioned into two spanning trees imply that G is Laman 
M{G/e) is the disjoint union of two bases (spanning trees) for all e G E{G) 
M{H) — e is the disjoint union of two bases (spanning trees) for all e € E{H) 
^ H is co-Laman. □ 

Recall the second Henneberg operation (edge splitting) from Section 1, which 
adds a new vertex and three new edges. If the third edge incident to the new 
vertex may also be connected to the endvertices of the removed edge (i.e. it may 
be parallel to one of the other new edges), we say that we perform a weak edge 
splitting operation. The plane weak edge splitting operation is the topological 
version of weak edge splitting (c.f. Lemma 1). It is not difficult to see that the 
dual of plane vertex splitting is plane weak edge splitting. Thus Theorem 5 and 
Theorem 7 imply the following inductive construction via planar duality. 

Theorem 8. A plane graph is co-Laman if and only if it can be obtained from 
a loop by plane weak edge splittings. 
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6 Algorithms 

In this section we describe the main ideas of an algorithm which can identify 
a face contractible edge in a plane Laman graph G in 0{n?) time, where n 
is the number of vertices of G. It uses a subroutine to test whether an edge 
e = itz; of a triangle face is contractible (and to find its maximal blocker, if it 
is non-contractible), which is based on the following lemma. The maximal rigid 
subgraphs of a graph are called the rigid components, see [5,8] for more details. 

Lemma 10. Let G be a plane Laman graph and let e = uv he an edge of a 
triangle face uvw of G. Then either e is contractible, or the rigid component G 
of G — w containing e has at least three vertices. Ln the latter case the vertex set 
of C is the maximal blocker of e with respect to uvw in G. 

There exist algorithms for computing the rigid components of a graph in 
0{n^) time, see e.g. [1,4,7]. 

The algorithm identifies a decreasing sequence V = Xq, Xi, Xt of subsets 
of V such that Xi+i is a blocker of some triangle edge of G[Xi] for 0 < i < t — 1 
and terminates with a contractible edge of G[Xt] which is also contractible in 
G. First it picks an arbitrary triangle face uvw of G and tests whether any 
of its edges is contractible (and computes the blockers, if they exist). If yes, 
it terminates with the desired contractible edge. If not, it picks the smallest 
blocker B found in this first iteration (which belongs to edge f = uv, say), puts 
Xi = B, and continues the search for a proper edge in G[Xi\. To this end it 
picks a triangle in G[ATi] (which is different from the special face of G[Xi\, if Xi 
is not a nice blocker) and tests whether any of its edges which are different from 
/ (and which are not on the special face of G[Xi], if Xi is not a nice blocker) is 
contractible in G[Xi], If yes, it terminates with that edge. Otherwise it picks the 
smallest blocker B' found in this iteration among those blockers which do not 
contain / (and which do not contain edges of the special face of G\Xi\, if Xi is 
not a nice blocker), puts X 2 = B' , and iterates. The following lemma, which can 
be proved by using Lemma 4, shows that there are always at least two blockers 
to choose from. 

Lemma 11. (i) Let f he an edge and let uvw he a triangle face whose edges 
(except f , if f is on the triangle) are non-contractible. Then at least two blockers 
of the edges of uvw do not contain f. 

(ii) Let xyz and uvw be two distinct triangle faces such that the edges of uvw 
(except those edges which are common with xyz) are non-contractible. Then at 
least two blockers of the edges of uvw do not contain edges of xyz. 

It follows from Lemma 8, Lemma 9, and Lemma 11 that the algorithm will 
eventually find a proper edge e in some subgraph G[Xt], and that this edge will 
be face contractible in G as well. The fact that the algorithm picks the smallest 
blocker from at least two candidates (and that the blockers have exactly one 
vertex in common by Lemma 4) implies that the size of Xi will be essentially 
halved after every iteration. From this it is not difficult to see that the total 
running time is also O(n^). 
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Theorem 9. A face contractible edge of a plane Laman graph can be found in 

O(n^) time. 

Theorem 9 implies that an inductive construction via vertex splitting can be 

built in 0{n^) time. This matches the best time bound for finding a Henneberg 

construction, see e.g. [6]. 
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Abstract. We obtain faster algorithms for problems such as r- 
dimensional matching, r-set packing, graph packing, and graph edge 
packing when the size k of the solution is considered a parameter. We 
first establish a general framework for finding and exploiting small prob- 
lem kernels (of size polynomial in k). Previously such a kernel was known 
only for triangle packing. This technique lets us combine, in a new and 
sophisticated way, Alon, Yuster and Zwick’s color-coding technique with 
dynamic programming on the structure of the kernel to obtain faster 
fixed-parameter algorithms for these problems. Our algorithms run in 
time 0(n + an improvement over previous algorithms for some 

of these problems running in time 0{n + The flexibility of our 

approach allows tuning of algorithms to obtain smaller constants in the 
exponent. 

1 Introduction 

In this paper we demonstrate a general method for solving parameterized packing 
and matching problems by first finding a problem kernel, and then showing how 
color-coding (the use of nice families of hash functions to color a substructure 
with distinct colors) and dynamic programming on subsets of colors can be used 
to create fixed-parameter algorithms. The problems we consider include param- 
eterized versions of r-dimensional matching (a generalization of 3-dimensional 
matching, from Karp’s original list of NP-complete problems [Kar72]), r-set 
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packing (on Karp’s list in its unrestricted form, and FF[l]-complete [ADP80] 
in its unrestricted parameterized form, hence unlikely to be fixed-parameter 
tractable), graph packing (a generalization of subgraph isomorphism known to 
be NP-complete [Coo71]), and graph edge packing (shown to be NP-complete by 
Holyer [H0I8I]). Further generalizations are considered at the end of the paper. 

Previous efforts in finding fixed-parameter algorithms for these problems 
(asking if a given input has a solution of size at least k) have resulted in running 
times involving a factor of namely 0((5.7fc)^ -n) for 3-dimensional match- 

ing [CFJKOl] and 3-set packing [JZC04]. Earlier work did not use the technique 
of kernelization (finding a subproblem of size f{k) within which any solution of 
size k must lie) except for work on the problem of packing triangles [FHR+04]; 
that result uses the technique of crown decomposition [Fel03,CFJ04], a technique 
that is not used in this paper (concurrent with this paper are further uses of this 
technique for packing triangles [MPS04] and stars [PS04]). Using our techniques, 
we improve all these results by replacing the factor with a factor of 

Kernelization also allows us to make the dependence on k additive, that is, we 
can achieve a running time of 0(n-|- 2'^^^^) (the hidden constant in the exponent 
is linearly dependent on r). 

The techniques we introduce make use of the fact that our goal is to obtain 
at least k disjoint objects, each represented as a tuple of values. We first form a 
maximal set of disjoint objects and then use this set both as a way to bound the 
number of values outside the set (forming the kernel) and as a tool in the color- 
coding dynamic programming (forming the algorithm). We are able to reduce 
the size of the constants inherent in the choice of hash functions in the original 
formulation of color-coding [AYZ95] by applying the coloring to our kernels only, 
and using smaller families of hash functions with weaker properties. Although 
both color-coding and dynamic programming across subsets appear in other 
work [Mar04,Woe03,CFJ04], our technique breaks new ground by refining these 
ideas in their joint use. 

After introducing notation common to all the problems in Section 2, we first 
present the kernelization algorithm for r-DiMENSiONAL Matching (Section 3), 
followed by the fixed-parameter algorithm in Section 4. Next, we consider the 
modifications necessary to solve the other problems in Section 5. The paper 
concludes with a discussion of tuning the hash functions (Section 6) and further 
generalization of our techniques (Section 7). 

2 Definitions 

Each of the problems considered in this paper requires the determination of k 
or more disjoint structures within the given input: in r-dimensional matching, 
we find k points so that no coordinates are repeated; in r-set packing, we find 
k disjoint sets; in graph packing, we find k vertex-disjoint subgraphs isomorphic 
to input H] and in graph edge packing, we find k edge-disjoint subgraphs iso- 
morphic to input H . To unify the approach taken to the problems, we view each 
structure as an r-tuple (where in the last two cases r = \V{H)\ and r = \E{H)\, 
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respectively); the problems differ in the ways the r-tuples must be disjoint and, 
in the case of graph packing, on additional constraints put on the structures. 

To define the r-tuples, we consider a collection Ai, of pair-wise disjoint 
sets and define A to he Ai x ■ ■ ■ x A^. We call the elements of each Ai values; 
given an r-tuple T G A, we denote as val(T) the set of values of T, and for 
any 5 C Al, we define val(5) = val(T). Depending on the problem, we 

will either wish to ensure that tuples chosen have disjoint sets of values or, for 
r-dimensional matching, that tuples “disagree” at a particular coordinate. 

We introduce further notation to handle the discussion of r-tuples. Given a 
tuple T G A, the ith coordinate ofT, denoted T{i), is the value of T from set 
Ai. Two r-tuples T and T' are linked if they agree in some coordinate i, that is, 
when there exists some i in {1, . . . , r} such that T{i) = T'{i). We denote this fact 
as T ^ T'. If T ~ T' does not hold then we say that T and T' are independent, 
denoted T T' . 

We can now define our example problem, r-DiMENSiONAL Matching, in 
which the input is a set of r-tuples and the goal is to find at least k indepen- 
dent tuples. The precise descriptions of the other problems will be deferred to 
Section 5. 

r-DiMENSIONAL MATCHING 

Instance: A set S C A = Ai x ■■■ x A^ for some collection Ai, Ar of pair-wise 
disjoint sets. 

Parameter: A non-negative integer k. 

Question: Is there a matching V for S of size at least k, that is, is there a subset 
7^ of 5 where for any T, T' € P, T / T' and \V\ > k? 

In forming a kernel, we will be able to reduce the number of tuples that 
contain a particular value or set of values. To describe a set of tuples in which 
some values are specified and others may range freely, we will need an augmented 
version A* of each Ai so that A* = U {*}, i G {1 . . . , r}; the stars will be used 
to denote positions at which values are unrestricted. We will refer to special 
tuples, patterns, drawn from the set A* = A* x • • • x A*. Associated with each 
pattern will be a corresponding set of tuples. More formally, given R G A*, we 
set A[R\ = A'^x ■■■ X A'^ where 

, ^ r Aj if R{i) = * 

* \ {i?(*)} otherwise 

As we will typically wish to consider the subset of S extracted using the pattern, 
we further define, for any 5 C A, the set S[R] = A[R] fl S. We finally define 
free(i?) as the number of *’s in R, namely the number of unrestricted positions 
in the pattern; free(i?) will be called the freedom of R. 

3 A Kernel for r-DiMENSiONAL Matching 

Recall that a kernel is a subproblem of size depending only on k within which 
any solution of size k must lie. The idea behind our kernelization technique is 
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that if a large number of tuples match a particular pattern, we can remove some 
of them (this is known as a reduction rule). For each pattern R, the threshold 
size for the set of retained tuples is a function /(free(i?), fc), which we define 
later. 

The following gives a general family of reduction rules for the r-DiMENSiONAL 
Matching problem with input Sj and parameter k. For each application of the 
function Reduce, if the size of the subset Si[R] of S extracted using the pattern 
R is greater than the threshold, then |5| is reduced by removing enough elements 
of Si[R] to match the size of the function /. 



Function Reduce(57, R) where R G A* and 1 < free(i?) < r — 1. 
Input: A set Sj A and a pattern R G A* . 

Output: A set So ^ Sj. 

So ^ Si . 

If |5/[i?]| >/(free(i?),A:) 

then remove all but /(free(i?), k) tuples of Si[R] from So- 

output So- 



To form the kernel, we apply the function Reduce on all patterns R G A* in 
order of increasing freedom, as in the following routine. 



Function Fully-Reduce(iS/) 

Input: A set Sj C A. 

Output: A set So ^ Si. 

So ^ Si. 

For i = 1, . . . , r — 1 do 

for all R G A* where free(i?) = i, So G- Reduce(5o, R). 

output So. 



The next three lemmas prove that the above function computes a kernel; the 
two lemmas after that show how a slight modification can be used to bound the 
size of that kernel. 

We say that a set 5 C A is {I, k) -reduced if for any R G A* such that 
free(i?) = I, Reduce(5, R) = S (where 1 < f < r — 1). For notational convenience 
we define any set 5 C A to be 0-reduced; we can assume the input has no 
repeated tuples. As a direct consequence of the definition, if a set S is (f, k)- 
reduced, then for each pattern R such that free(i?) = f, |5[i?]| < f{£, k). 

We now show that our reduction function does not change a yes-instance of r- 
Dimensional Matching into a no-instance, nor vice versa. Given a set 5 C A 
and a non-negative integer fc, we denote as p{S,k) the set of all matchings for 
S of size at least fc. If p.{S, fc) is non-empty, 5 is a yes-instance. The following 
lemma shows that upon applying the reduction rule to an (f — l,fc)-reduced 
set using a pattern R of freedom £, yes-instances will remain yes-instances. The 
proof requires the precise definition of f{£,k)^ which is that /(0,fc) = 1 and 
f{£, fc) = f • (fc— 1) •/(£— 1, fc)-l-l for all .^ > 0. It is easy to see that /(£, fc) < i\k^. 
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Lemma 1 . For any i,l < i < r — 1, the following holds: If S Q A, S is 
(i— 1, k)-reduced, and S' =Reduce(5, i?) for some R such that free(i?) = I, then 
^(5, k) if and only if A) ^ 0 . 

Proof. Since S' C S, yi{S',k) 7 ^ 0 implies fi{S,k) 7 ^ 0- Supposing now that 
fi{S, k) ^ 0, we choose V G k) where \V\ = k and prove that /i(5', k) ^ 0. 
In essence, we need to show that either V G k) or that we can form a 

matching V' G k) using some of the tuples in V. 

Denote by S the set that is, the tuples retained when Reduce is applied 

to S using pattern R to form S' . Clearly if either V contained no tuple in S[R] 
or the single tuple T £ V (1 5[i?] was selected to be in 5, then V G fx{S',k)^ 
completing the proof in these two cases. 

In the case where T £ V and T ^ S, we show that there is a tuple T' £ S 
that can replace T to form a new matching, formalized in the claim below. 
Claim: There exists a, T' £ S that is independent from each T £ V — {T}. 

Proof: We first show that for any i £ {j \ R{j) = +} and any T £ V — {T} there 
exist at most f{£ — 1, k) r-tuples in S that agree with T at position i. The set 
of r-tuples in S[R] that agree with T at position i is exactly the set of tuples 
in where R' is obtained from R by replacing the * in position i by T(i). 

We use the size of this set to bound the size of the set of tuples in S that agree 
with T at position i. As 5 is (f — 1, fc)-reduced and free(ii') = £ — 1, we know 
that |5[ii']| < f{£— 1, k). Since in the special case where f = 1, we directly have 
< 1 = /(O, k), the bound of f[l — 1, k) holds for any T and any i. 

To complete the proof of the claim, we determine the number of elements 
of S that can be linked with any T and show that at least one member of S is 
not linked with any in the set V — {T}. Using the statement proved above, as 
\{j I ^(j) = *}\ = £ and |7^— {T}| = k—1, at most £-{k—l)-f{£—l, k) = f{£, k) — l 
elements of S will be linked with elements oiV — {T}. Since |5| = f{£, k), this 
implies the existence of some T' £ S with the required property. 

The claim above implies that V' = {V — {T}) U {T'} is a matching of S of size 
k. As T' £ S, V is also a matching of S' , and thus /i(5', fc) yf 0. □ 

Lemma 2 can be proved by induction on the number of iterations; proofs for 
this and subsequent lemmas are omitted due to space limitations. 

Lemma 2 . If S Q A and S' is the output of routine Fully-Reduce(iS), then (a) 
S' is an (r — 1, k)-reduced set and (b) /i(5, k) ^ % if and only if pi{S' , k) 0. 

We now use a maximal matching in conjunction with the function Fully- 
Reduce defined above to bound the size of the kernel. The following lemma is a 
direct consequence of the definition of an (r — 1, fc)-reduced set. 

Lemma 3. For any (r — l,fc)-redwced set 5 C A, any value x £ val(A) is 
contained in at most f(r — 1 , k) r-tuples of S . 

We now observe that if we have a maximal matching in a set S of r-tuples, we 
can bound the size of the set as a function of r, the size of the matching, and 
/(r — 1, k), as each r-tuple in the set must be linked with at least one r-tuple in 
the matching, and the number of such links is bounded by fir — 1, k) per value. 
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Lemma 4. If S Q A is an (r — 1, k) -reduced set containing a maximal matching 
M. of size at most m, then S contains no more than r ■ m ■ f{r — 1, k) r -tuples. 

Lemma 4 suggests a kernel for the r-DiMENSiONAL Matching problem in 
the function that follows; Lemma 5 shows the correctness of the procedure and 
size of the kernel. 



Function Kernel-Construct(iS) 

Input: A set 5 C A. 

Output: A set /C C 5. 

S' ^Fully-Reduce(5). 

Find a maximal matching A4 of S' . 

If \M\ > k then output any subset /C of At of size k and stop; 
otherwise output 1C ^ S'. 



Lemma 5. If S ^ A and K. is the output of the function Kernel-Construct 
then (a) |/C| € 0{k’'’) and (b) p,{S,k) 7^0 if and only if pi {1C, k) 7^ 0- 

Note that function Kernel-Construct can be computed in time 0{n) for fixed r, 
simply by looking at each tuple in turn and incrementing the counter associated 
with each of the 2’’ — 1 patterns derived from it, deleting the tuple if any such 
counter overflows its threshold. 

4 An FPT Algorithm for r-DiMENSiONAL Matching 

While it suffices to restrict attention to the kernel 1C when seeking a matching 
of size k, exhaustive search of 1C does not lead to a fast algorithm. Hence we 
propose a novel alternative, which combines the colour coding technique of Alon 
et al. [AYZ95] with the use of a maximal matching At C /C. 

In its original form, the colour-coding technique of Alon et al. makes use of 
a family of hash functions T = {f : U — lA} with the property that for any 

5 CU with 151 < |A|, there is an / G A that is 1 — 1 on A. The original idea, 

in the context of our problem, would be to look for a matching of size k whose 
values receive distinct colours under some / G A. In our case, the universe U 
is the set of values appearing in tuples in the kernel of the problem, which has 
size 0(fc’'), and the set of colours A has size rk. The family of hash functions 
A used by Alon et al. is of size in our case. In Section 6 we will consider 

modifications towards improving this approach. 

To modify colour-coding for our purposes, we first compute a maximal match- 
ing Al in the kernel. By the maximality of Al, each r-tuple in a matching V 
of size k in the kernel must contain a value in Al . Hence there may be at most 
(r — l)k values of V that do not belong to A4. We will seek a proper colouring 
of these values. Thus we choose a universe U = val(/C) — val(Al), and we set 
|A| = (r-l)fc.^ 

Our dynamic programming problem space has four dimensions. Two of them 
are the choice of hash function (j), and a subset W of values of Al that might be 
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used by V. For each choice of these, there are two more dimensions associated 
with a possible subset V' of P: the set of values Z in W it uses, and the set of 
colours C that (j) assigns its values not in W. More formally, for Ai a maximal 
matching of 5 C M, then for any W C val(Ad), cj) £ Z C FF, and C C X such 
that \Z\ + \C\<r- \V\ and \Z\ -|- |C| = 0 mod r we define 

{ 1 if there exists a matching V' Q S where \V'\ = 
val(iP') n val(TW) = Z, and (^(val(iP') - Z) = C, 

0 otherwise 

where for convenience we use the notation (f>{S) for a set S to be In 

order to solve the problem, our goal is then to use dynamic programming to 
determine B^^w{Z,C) for each W, for each (j) in the family iF, and for each Z 
and C such that \Z\ + \C\ < rk. 

To count the number of dynamic programming problems thus defined, we 
note that there are at most choices for W, C, and Z, and choices 

of (/) € IF. Thus there are problems in total. 

To formulate the recurrence for our dynamic programming problem, we note 
that B(j,_w{Z,C) = 1 for \Z\ + \C\ = 0, and observe that B^^w{Z,C) = 1, for 
jZj -I- \C\ > 0, holds precisely when there exists an r-tuple formed of a subset 
Z' of Z and a subset C of C such that there is a matching of size one smaller 
using Z — Z' and C — C . For the ease of exposition, we define the function 
: 2^ X 2^ — >• {0, 1} (computable in 0{k'') time) as 

{ 1 if there exists an r-tuple T £ S where Z' C val(T) 
and <()(val(T) - Z') = C' , 

0 otherwise 

Observing that each r-tuple must contain at least one element in Z, we can then 
calculate Bfj,\Y{Z,C) by dynamic programming as follows: 

r 1 if there exist Z' C Z , C C C with \Z'\ > 1, \Z'\ P \C'\ = r, 
B^^w{Z, C) = I P^{Z', C) = 1 and B^^w{Z - Z' ,C - C) = 1 
[ 0 otherwise 

One table entry can be computed in 0{k'^) time. The number of choices for Z' 
and C' can be bounded by = 0{k'"). The algorithm below is 

a summary of the above description, where .F is a set of hash functions from 
val(5) — val(Af) to X. 

In merging the following routine with the kernel construction, the maximal 
matching needs to be found only once. To find a maximal matching in the kernel, 
we use a greedy algorithm running in time 0{kT). The running time is dominated 
by kernel construction (taking time 0(n)) plus dynamic programming (taking 
time 2'^^’'^)). This yields the following theorem. 

Theorem 1. The problem r-DiMENSiONAL Matching can be solved in time 
0(n-b2°(’''=)). 
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Routine Color-Coding-Matching(iS) 

Input: A set 5 C A. 

Output: A member of the set {Yes, No}. 

Find a maximal matching Ai of S. 

If I At I > k then output “Yes” and stop, 
set V = val(iS) ~ val(At). 

For each W C val(At), do 

for each coloring (f> : V ^ X where (j) & J- 
for each Z C VF by increasing \Z\ 

for each C C X by increasing |C|, where jZj + jCI = 0 mod r 
compute C). 

if C) = 1 and jZj + \C\ = rk, output “Yes” and stop. 

Output “No”. 



5 Kernels and FPT Algorithms for the Other Problems 

We can now define other problems addressable by our technique. To avoid intro- 
ducing new notation, each of them will be defined in terms of sets of tuples, even 
though order within a tuple is not important in some cases. For t-Set Packing, 
the input r-tuples are subsets of elements from a base set A, each of size at most 
r, and the goal is to find at least k r-tuples, none of which share elements. 
r-SET Packing 

Instance: A collection C of sets drawn from a set A, each of size of at most r. 
Parameter: A non-negative integer k. 

Question: Does C contain at least k mutually disjoint sets, that is, for 5 C A = 
A”, is there a subset P of 5 where for any T, T' € V, val(T) fl val(T') = 0 and 
\V\ > k? 

In order to define the graph problems, we define G[S'] to be the subgraph of 
G induced on the vertex set S C V{G), namely the graph G' = (F(G'), if(G')), 
where F(G') = S and E{G') = {(m,u) | u, r € S,{u,v) G E{G)}. Moreover, we 
use C G to denote that H is & (not necessarily induced) subgraph of G. 
Graph Packing 

Instance: Two graphs G = {V{G), E{G)) and H = {V{H), E{H)). 

Parameter: A non-negative integer k. 

Question: Does G contain at least k vertex-disjoint subgraphs each isomorphic 
to H, that is, for 5 C A = is there a subset P of 5 where for 

any T, T' G V, val(T) fl val(T') = 0, there is a subgraph G' C G[val(T)] that is 
isomorphic to H for any T G V, and |P| > k? 

Graph Edge-Packing 

Instance: Two graphs G = (V{G),E{G)) and H = {V{H),E{H)) such that H 
has no isolated vertices. 

Parameter: A non-negative integer k. 

Question: Does G contain at least k edge-disjoint subgraphs each isomorphic to 
H, that is, for S C A = is there a subset P of 5 where for any 
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T,T' £ V, val(r) nval(T') = 0, there is a subgraph G' that is isomorphic to H 
such that each edge of G' is in T for any T £V, and \V\ > kl 

To obtain kernels for the above problems, we can adapt the ideas developed 
for r-DiMENSiONAL Matching. As in that case, we first apply a reduction rule 
that limits the number of tuples containing a particular value or set of values, 
and then use a maximal set of disjoint objects to bound the size of the kernel. 
In the remainder of this section we detail the differences among the solutions for 
the various problems. 

For each of these problems, the tuples in question are drawn from the same 
base set {A, V{G), or E{G)) respectively, resulting in the tuples being sets of 
size at most r, as position within a tuple is no longer important. Since in the last 
two problems there is an additional constraint that the set of vertices or edges 
forms a graph isomorphic to if, the sets are forced to be of size exactly r; for 
r-SET Packing there is no such constraint, allowing smaller sets. 

For all three problems, since the A^’s are no longer disjoint, the potential for 
conflicts increases. As a consequence, we define a new function g for use in the 
function Reduce: g(0, k) = 1 and g{i, k) = r •£• (fc — 1) •^(i— 1, fc) -I- 1 for all i > 0, 
for r = \V{H)\ and r = |£’(ii)| in the latter two problems. By replacing f hy g 
in the definition of the function Reduce, we obtain a new function SetReduce and 
the notion of a set being {£, k) -set-reduced, and by replacing Reduce by SetRe- 
duce in the function Fully-Reduce, we can obtain a new function Fully-SetReduce. 
Instead of finding a maximal matching, we find a maximal set of disjoint sets or 
a maximal set of disjoint sets of vertices or edges yielding subgraphs isomorphic 
to H in the function Kernel-Construct. It then remains to prove analogues of 
Lemmas 1 through 5; sketches of the changes are given below. 

Each of the lemmas can be modified by replacing / by g, Reduce by SetReduce, 
and Fully-Reduce by Fully-SetReduce, with the analogue of Lemma 5 yielding a 
kernel of size 0{k'^). The need for g instead of / is evident in the proof of the 
claim in the analogue of Lemma 1. Here we count the number of r-tuples T in 
S such that val(T) Cl val(T) 0. Since the r-tuples are in fact sets, positions 
have no meaning, and hence the number of such r-tuples will be the number 
of positions per tuple (that is, r) multiplied by the number of free positions 
(i in any R' formed by fixing one more index) multiplied by the number of 
tuples in P — {T} (that is, k—1) multiplied by g{£ —l,k). In total this gives us 
r ■ £ ■ {k — 1) ■ g{£ — l,k) = g{£, k) — 1 elements of S with nonempty intersection 
with values in elements of V — {T}, as needed to show that there is a T' with the 
required property. We then have the following lemma, analogous to Lemma 5: 

Lemma 6. For the problems r-SET Packing, Graph Packing, and Graph 
Edge-Packing, if S <£ A and K. is the output of the function Kernel- 
Construct (IS^, then (a) |/C| £ 0{r'~k’^) and (b) S has a set of at least k objects (dis- 
joint sets, vertex-disjoint subgraphs isomorphic to H , or edge-disjoint subgraphs 
isomorphic to H, respectively) if and only iflC has a set of at least k such objects. 

To obtain algorithms for these problems, we adapt the ideas developed for 
r-DiMENSiONAL MATCHING; as in that case we form a kernel, find a maximal set 
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of disjoint sets or graphs, and then use dynamic programming and color-coding 
to find a solution of size k, if one exists. 

Our algorithms differ from that for r-DiMENSiONAL Matching only in the 
definitions of C) and P^. In particular, the conditions for C) to 

be 1 are as follows: there exists a set 7^ C 5 of disjoint sets where \V\ > 
val(P) n val(AI) = Z, and </)(val(P) — Z) = C (t-Set Packing); there exists 
a set P C 5 such that \V\ > for any T, T' € V, val(T) 0 val(T') = 0, 

for each T G V there is a subgraph G' C G[val(T)] that is isomorphic to T, 
val(P) n val(A^) = Z, and <f>(yai\(V) — Z) = C (Graph Packing); and there 
exists a set P C 5 such that \V\ > for any T, T' € V, val(T)nval(T') = 0, 

for each T G V there is a subgraph G' that is isomorphic to H such that each 
edge of G' is in T, val(P) nval(AI) = Z, and <()(val(P) — Z) = G (Graph Edge- 
Packing). In addition, we alter the conditions for P^j, to be 1 in a similar manner, 
though no alteration is needed for t-Set Packing: for Graph Packing we add 
the condition that there is a subgraph G' C G[val(T)] that is isomorphic to H 
and for Graph Edge- Packing we add the condition that there is a subgraph 
G' that is isomorphic to H such that each edge of G' is in T. 

The analysis of the algorithms depends on Lemma 6 and, in the graph prob- 
lems the need to find all copies of PI in G, resulting in the theorem below. 

Theorem 2. The problems t-Set Packing, Graph Packing, and Graph 
Edge-Packing can solved in time 0{n + and 

( 9 ( 7 j|£^(^f)l -|_ 2 ‘^Gfc) respectively. 

6 Choosing Hash Functions 

To construct the family of hash functions IF = {/ : C/ — I X} with the prop- 
erty that for any subset S' of C/ with |S| = |X|, there is an. f G P that is 
1-1 on S, Alon et al. used results from the theory of perfect hash functions to- 
gether with derandomization of small sample spaces that support almost £-wise 
independent random variables. They were able to construct a family T with 
\P\ = 2*^(1'’'^!) log |[/|, and this is what we used in Section 4. However, they made 
no attempt to optimize the constant hidden in the O-notation, since they were 
assuming |A| fixed and \U\ equal to the size of the input. 

In our case (and in any practical implementation of the algorithms of Alon 
et al.), it would be preferable to lower the constant in the exponent as much 
as possible. We have one advantage: kernelization means that \U\ in our case is 
0(fc’’), not 0{n), and so we are less concerned about dependence on |[/| (note 
that IS”! = (r — l)fc in our case). Two other optimizations are applicable both 
to our situation and that of Alon et al. First, we can allow more colours, i.e. 
|A| = a{r — l)k for some constant a. This will increase the number of dynamic 
programming problems, since the number of choices for G (in determining the 
number of B^^w{Z,C)) grows from 2^’'“^)^ to ^ (ea)^’’"^^^. But this 

may be offset by the reduction in the size of the family of hash functions, since 
allowing more colours makes it easier to find a perfect hash function. Second, 



Faster Fixed-Parameter Tractable Algorithms 321 

for some of the work on the theory of perfect hash functions used by Alon et al. , 
it is important that the hash functions be computable in 0(1) time, whereas in 
our application, we can allow time polynomial in k. 

As an example of applying such optimization, we can make use of the work of 
Slot and van Emde Boas [SvEB85] . They give a scheme based on the pioneering 
work of Fredman, Komlos, and Szemeredi [FKS82] that in our case results in a 
family T of size where |A| = 6151, and it takes 0(fc) time to 

compute the value of a hash function. We will not explore this line of improve- 
ment further beyond underlining that there is a tradeoff between the power of 
the family of hash functions we use and the size, and we have some latitude in 
choosing families with weaker properties. 

Another possibility, also discussed by Alon et ah, is to replace the determin- 
istic search for a hash function (in a family where the explicit construction is 
complicated) by a random choice of colouring. A random 2|5|-colouring of U is 
likely to be 1-1 on a subset S with failure probability that is exponentially small 
in |5|. However, this means that a “no” answer has a small probability of error, 
since it could be due to the failure to find a perfect hash function. 

7 Conclusions 

Our results can be extended to handle problems such as packing or edge-packing 
graphs from a set of supplied graphs, and more generally to disjoint structures. 
To express all problems in a general framework, we view tuples as directed hy- 
pergraph edges, sets of tuples as hypergraphs, and inputs as triples of a sequence 
of hypergraphs, a matching-size parameter £, and a supplied hypergraph G. 

To define the necessary terms, we consider the projection of tuples and an 
associated notion of independence. Given a tuple T = (ui , . . . ,Vr>) where r' < r 
and a subset of indices I = {ii,. . . ,it) C {!,..., r}, we define the projection of 
T on I as T[I] = (vi -^ , . . . , Vm) where m = max{l, . . . , r'} fl {Aj ■ • ■ j ii}- To form 
a set of related tuples, given an hypergraph S and a subset I C {1, . . . , r} such 
that |/| = i, we denote as w{S,I) = {T' € \ there exists a tuple T G S 

such that T[I] = T'}. We then say that a subsequence C' C C is an £-matching 
if for any pair of hypergraphs S, S' G C , for all subsets of indices I C |1, . . . , r| 
of size £, w{S,I)^w{S',I). 

Our problem is then one of finding a subsequence C of the hypergraphs form- 
ing an ^-matching of size at least k such that the elements of each hypergraph 
in C can be restricted to a smaller structure isomorphic to G. This problem 
subsumes all problems in this paper, and can be solved in time 0{n -I- 
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Abstract. We introduce a model to study the temporal behaviour of 
selfish agents in networks. So far, most of the analysis of selfish routing 
is concerned with static properties of equilibria which is one of the most 
fundamental paradigms in classical Game Theory. By adopting a gener- 
alised approach of Evolutionary Game Theory we extend the model of 
selfish routing to study the dynamical behaviour of agents. 

For symmetric games corresponding to singlecommodity flow, we show 
that the game converges to a Nash equilibrium in a restricted strategy 
space. In particular we prove that the time for the agents to reach an 
e-approximate equilibrium is polynomial in t and only logarithmic in the 
ratio between maximal and optimal latency. In addition, we present an 
almost matching lower bound in the same parameters. 

Furthermore, we extend the model to asymmetric games corresponding 
to multicommodity flow. Here we also prove convergence to restricted 
Nash equilibria, and we derive upper bounds for the convergence time 
that are linear instead of logarithmic. 



1 Introduction 

Presently, the application of Game Theory to networks and congestion games is 
gaining a growing amount of interest in Theoretical Computer Science. One of the 
most fundamental paradigms of Game Theory is the notion of Nash equilibria. 
So far, many results on equilibria have been derived. Most of these are concerned 
with the ratio between average cost at an equilibrium and the social optimum, 
mostly referred to as the coordination ratio or the price of anarchy [2,7,8,13], 
ways to improve this ratio, e. g. by taxes [1], and algorithmic complexity and 
efficiency of computing such equilibria [3,4,5]. 

Classical Game Theory is based on fully rational behaviour of players and 
global knowledge of all details of the game under study. For routing games in 
large networks like the Internet, these assumptions are clearly far away from 
being realistic. Evolutionary Game Theory makes a different attempt to explain 
why large populations of agents may or may not “converge” towards equilibrium 
states. This theory is mainly based on the so-called replicator dynamics, a model 
of an evolutionary process in which agents revise their strategies from time to 
time based on local observations. 

* Supported in part by the EU within the 6th Framework Programme under contract 
001907 (DELIS) and by DFG grant Vo889/l-2. 

S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 323-334, 2004. 
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In this paper, we apply Evolutionary Game Theory to selfish routing. This 
enables us to study the dynamics of selfish routing rather than only the static 
structure of equilibria. We prove that the only existing fixed points of the repli- 
cator dynamics are Nash equilibria over a restricted strategy space. We prove 
that these fixed points are “evolutionary stable” which implies that the replica- 
tor dynamics converges to one of them. One standard approach in Evolutionary 
Game Theory to prove stability is based on symmetry properties of payoff matri- 
ces. However, we cannot simply cast these results as our model of selfish routing 
allows for arbitrary latency functions whereas the existing literature on Evolu- 
tionary Game Theory assumes affine payoff functions corresponding to linear 
latency functions with zero offset. In fact our proof of evolutionary stability is 
based on monotonicity instead of symmetry. 

Another aspect that - to our knowledge - has neither been considered in 
Evolutionary nor in classical Game Theory is the time it takes to reach or come 
close to equilibria - the speed of convergence. We believe that this is an issue 
of particular importance as equilibria are only meaningful if they are reached in 
reasonable time. In fact we can prove that symmetric congestion games - cor- 
responding to singlecommodity flow - converge very quickly to an approximate 
equilibrium. For asymmetric congestion games our bounds are slightly weaker. 

The well established models of selfish routing and Evolutionary Game Theory 
are described in Section 2. In Section 3 we show how a generalisation of the 
latter can be applied to the first. In Section 4 we present our results on the 
speed of convergence. All proofs are collected in Section 5. We finish with some 
conclusions and open problems. 

2 Known Models 

2.1 Selfish Routing 

Gonsider a network G = (V, E) and for all e G E, latency functions le ■ [0, 1] i— >■ R 
assigned to the edges mapping load to latency, and a set of commodities I. For 
commodity i G I there is a fixed flow demand of di that is to be routed from 
source Si to sink ti. The total flow demand of all commodities is 1. Denote the 
set of Si-t^-paths by Pi C V{E), where V{E) is the power set of E. For simplicity 
of notation we assume that the Pi are disjoint, which is certainly true if the pairs 
(si,ti) are pairwise distinct. Then there is a unique commodity i{p) associated 
with each path p G P. Let furthermore P = IJiGi 

For p G P denote the amount of flow routed over path phy Xp. We combine 
the individual values Xp into a vector x. The set of legal flows is the simplex^ 
A := {a;|Vi G T : ~ Furthermore, for e G E, x^ '■= 

total load of edge e. The latency of edge e G E is le{xe) and the latency of path 
p is 

lp{x) = ^^lg{Xg). 

e£p 

^ Strictly speaking, A is not a simplex, but a product of \X\ simplices scaled by di. 
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The average latency with respect to commodity f € I is 

P&Pi 

In a more general setting one could abstract from graphs and consider arbitrary 
sets of resources E and arbitrary non-empty subsets of E as legal strategies. We 
do not do this for simplicity, though it should be clear that all our considerations 
below also hold for the general case. 

In order to study the behaviour of users in networks, one assumes that there 
are an infinite number of agents carrying an infinitesimal amount of flow each. In 
this context the flow vector x can also be seen as a population of agents where 
Xp is the non-integral “number” of agents selecting strategy p. 

The individual agents strive to choose a path that minimises their personal 
latency regardless of social benefit. This does not impute maliciousness to the 
agents, but simply arises from the lack of a possibility to coordinate strategies. 
One can ask: What population will arise if we assume complete rationality and 
if all agents have full knowledge about the network and the latency functions? 

This is where the notion of an equilibrium comes into play. A flow or popu- 
lation X, is said to be at Nash equilibrium [11], when no agent has an incentive 
to change their strategy. The classical existence result by Nash holds if mixed 
strategies are allowed, i.e., agents may choose their strategy by a random dis- 
tribution. However, mixed strategies and an infinite population of agents using 
pure strategies are basically the same. A very useful characterisation of a Nash 
equilibrium is the Wardrop equilibrium [6]. 

Definition 1 (Wardrop equilibrium) . A flow x is at Wardrop equilibrium if 
and only if for alii G I and allp,p' G Pi with Xp > 0 it holds that lp{x) < Ipflx). 

Several interesting static aspects of equilibria have been studied, among them 
the famous price of anarchy, which is the ratio between average latency at a 
Nash equilibrium and optimal average latency. 

2.2 Evolutionary Game Theory 

Classical Game Theory assumes that all agents - equipped with complete knowl- 
edge about the game and full rationality - will come to a Wardrop equilib- 
rium. However, these assumptions seem far from realistic when it comes to net- 
works. Evolutionary Game Theory gets rid of these assumptions by modelling the 
agents’ behaviour in a very natural way that requires the agents simply to observe 
their own and other agents’ payoff and strategy and change their own strategies 
based on these observations. Starting with an initial population vector a;(0) - as 
a function of time - one is interested in its derivative with respect to time x. 

Originally, Evolutionary Game Theory derives dynamics for large populations 
of individuals from symmetric two-player games. Any two-player game can be 
described by a matrix A = (aij) where is the payoff of strategy i when played 
against strategy j. Suppose that two individuals are drawn at random from a 
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large population to play a game described by the payoff matrix A. Let Xi denote 
the fraction of players playing strategy i at some given point of time. Then the 
expected payoff of an agent playing strategy i against the opponent randomly 
chosen from population x is the component of the matrix product {Ax)^. 
The average payoff of the entire population is x ■ Ax, where • is the inner, or 
scalar, product. 

Consider an initial population in which each agent is assigned a pure strategy. 
At each point of time, each agent plays against an opponent chosen uniformly 
at random. The agent observes its own and its opponents payoff and decides to 
imitate its opponent by adopting its strategy with probability proportional to 
the payoff difference. One could argue that often it is not possible to observe 
the opponent’s payoff. In that case, consider a random aspiration level for each 
agent. Whenever an agent falls short of this level, it adopts a randomly observed 
strategy. Interestingly, both scenarios lead to a dynamics which can be described 
by the differential equation 



±i = X{x) ■ Xi ■ {{Ax)i — xAx) (1) 

for some positive function A. This equation has interesting and desirable prop- 
erties. First, the growth rate of strategy i should clearly be proportional to 
the number of agents already playing this strategy, if we assume homogeneous 
agents. Then we have Xi = Xi ■ gi{x). Secondly, the growth rate gi{x) should 
increase with payoff, i.e., (Ax)i > {Ax)j should imply gi{x) > gj{x) and vice 
versa. Dynamics having this property are called monotone. In order to extend 
this, we say that g is aggregate monotone if for inhomogeneous (sub)populations 
the total growth rate of the subpopulations increases with the average payoff of 
the subpopulation, i. e., for vectors y,z G A it holds that y ■ Ax < z ■ Ax if and 
only if y ■ g{x) < z ■ g{x). It is known that all aggregate monotone dynamics 
can be written in the form of equation (1). In Evolutionary Game Theory the 
most common choice for A seems to be the constant function 1. For A(a:) = 1, 
this dynamics is known as the replicator dynamics. 

Since the differential equation (1) contains cubic terms there is no general 
method for solving it. It is found that under some reasonable conditions fixed 
points of this dynamics coincide with Nash equilibria. For a more comprehensive 
introduction into Evolutionary Game Theory see for example [14]. 

3 Evolutionary Selfish Flow 

We extend the dynamics known from Evolutionary Game Theory to our model of 
selfish routing. We will see that there is a natural generalisation of equation (1) 
for our scenario. 

3.1 Dynamics for Selfish Routing 

First consider symmetric games corresponding to singlecommodity flow. In con- 
gestion games latency relates to payoff, but with opposite sign. Unless we have 
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linear latency functions without offset, we cannot express the payoff by means 
of a matrix A. Therefore we replace the term (Aa;)^ by the latency li{x). The 
average payoff x Ax is replaced by the average latency l{x). Altogether we have 

Xp — A(a?) * Xp * (/(x) (2) 

Evolutionary Game Theory also allows for asymmetric games, i. e., games where 
each agent belongs to one class determining the set of legal strategies. In princi- 
ple, asymmetric games correspond to multicommodity flow. The suggested gen- 
eralisations for asymmetric games, however, are not particularly useful in our 
context since they assume that agents of the same class do not play against each 
other. We suggest a simple and natural generalisation towards multicommodity 
flow. Agents behave exactly as in the singlecommodity case but they compare 
their own latency to the average latency over the agents in the same class. Al- 
though in Evolutionary Game Theory A = 1 seems to be the most common 
choice even for asymmetric games, we suggest to choose the factor A dependent 
on the commodity. We will later see why this might be useful. This gives the 
following variant of the replicator dynamics: 

Xp = Aj(p)(x) • Xp ■ (l.i(^p^(x) lp(^x)). (3) 

We believe that this, in fact, is a realistic model of communication in networks 
since agents only need to “communicate” with agents having the same source 
and destination nodes. 

In order for equation (3) to constitute a legal dynamics, we must ensure that 
for all t the population shares sum up to the total flow demand. 

Proposition 1. Let x{t) be a solution of equation (3). Then x{t) g A for all t. 

This is proved in Section 5. Note that strategies that are not present in the initial 
population are not generated by the dynamics. Gonversely, positive strategies 
never get completely extinct. 

3.2 Stability and Convergence 

Maynard Smith and Price [9] introduced the concept of evolutionary stability 
which is stricter than the concept of a Nash equilibrium. Intuitively, a strategy is 
evolutionary stable if it is at a Nash equilibrium and earns more against a mutant 
strategy than the mutant strategy earns against itself. Adopted to selfish routing 
we can define it in the following way. 

Definition 2 (evolutionary stable). A strategy x is evolutionary stable if it 
is at a Nash equilibrium and x ■ l(y) < y ■ l{y) for all best replies y to x. 

A strategy y is a best reply to strategy x if no other strategy yields a better 
payoff when played against x. 

Proposition 2. Suppose that latency functions are strictly increasing. If x is 
at a Wardrop equilibrium, then x ■ l{y) < y ■ l{y) for all y (especially, for all 
best replies) and hence x is evolutionary stable. 
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Consider a Wardrop equilibrium x and a population y such that for some strat- 
egy p € P, Up = 0 and Xp > 0. The replicator dynamics starting at y does 
not converge to x since it ignores strategy p. Therefore we consider a restricted 
strategy space P' containing only the paths with positive value and restricted 
Wardrop equilibria over this restricted strategy space. 

Proposition 3. Suppose that for all i,j G I, Xi = Xj and Xi{x) > e for some 
e > 0 and any x G A. Let y{t) be a solution to the replicator dynamics (3) and 
let X be a restricted Wardrop equilibrium with respect to y(0). Then y converges 
towards x, i. e., limt_>.oo \ \y{t) — a:|| = 0. 

Both propositions are proved in Section 5. 



4 Speed of Convergence 

In order to study the speed of convergence we suggest two modifications to 
the standard approach of Evolutionary Game Theory. First, one must ensure 
that the growth rate of the population shares does not depend on the scale 
by which we measure latency, as is the case if we choose A(x) = 1 or any 
other constant. Therefore we suggest to choose Xi{x) = li{x)~^ . This choice 
arises quite naturally if we assume that the probability of agents changing their 
strategy depends on their relative latency with respect to the current average 
latency. We call the resulting dynamics the relative replicator dynamics. 

Secondly, the Euclidian distance ||y — a:|| is not a suitable measure for ap- 
proximation. Since “sleeping minorities on cheap paths” may grow arbitrarily 
slowly, it may take arbitrarily long for the current population to come close to 
the final Nash equilibrium in Euclidian distance. The idea behind our definition 
of approximate equilibria is not to wait for these sleeping minorities. 

Definition 3 (e-approximate equilibrium). Let be the set of paths that 
have latency at least (1 -|- e) • I, i. e., P^ = {p G P\lp{x) > (1 -|- e) • Z} and let 
Xe '■= number of agents using these paths. A population x is 

said to be at an e-approximate equilibrium if and only if x^. < e. 

Note that, by our considerations above, e-approximate equilibria can be left 
again, when minorities start to grow. 

We will give our bounds in terms of maximal and optimal latency. Denote 
the maximum latency by /max := maxpgp /p(ep) where 6p is the unit vector for 
path p. Let /* := mina;g/i l(x) be the average latency at a social optimum. 

Theorem 1. The replicator dynamics for general singlecommodity flow net- 
works and non- decreasing latency functions converges to an e-approximate equi- 
librium within time 0{e~^ ■ ln(/maa://*))- 

This theorem is robust. The proof does not require that agents behave exactly 
as described by the replicator dynamics. It is only necessary that a constant 
fraction of the agents that are by a factor of (1 + e) above the average move to 
a better strategy. We can also show that our bound is in the right ballpark. 
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Theorem 2. For any r := Imax/l* there exists a network and boundary condi- 
tions such that the time to reach an e-approximate equilibrium is bounded from 
below by f7(e“^lnr). 

For the multicommodity case, we can only derive a linear upper bound. If we 
define := {p\lp > (1 + e) ’ h{p)} then the definition of Xe and Definition 3 
translate naturally to the multicommodity case. Let I* = min^-g^ ^^(a;) be the 
minimal average latency for commodity i G T and let I* = min^gx ^ ■ 

Theorem 3. The multicommodity replicator dynamics converges to an e-ap- 
proximate equilibrium within time 0{e~^ ■ /max/^*)- 

Our analysis of convergence in terms of e-approximate equilibria uses Rosenthal’s 
potential function [12]. However, this function is not suitable for the proof of 
convergence in terms of the Euclidian distance since it does not give a general 
upper bound. Here we use the entropy as a potential function, which in turn is 
not suitable for e-approximations. Because of our choice of the Xi in this section. 
Proposition 3 cannot be translated directly to the multicommodity case. 

5 Proofs 

First we prove that the relative replicator dynamics is a legal dynamics in the 
context of selfish routing, i. e., it does not leave the simplex A. 

Proof (of Proposition 1 ). We show that for all commodities i S I the derivatives 
ip, p € Pi sum up to 0. 



Now we prove evolutionary stability. 

Proof (of Proposition 2). Let x he a Nash equilibrium. We want to show that 
X is evolutionary stable. In a Wardrop equilibrium all latencies of used paths 
belonging to the same commodity are equal. The latency of unused paths is 
equal or even greater than the latency of used paths. Therefore x -l{x) < y ■ l{x) 
for all populations y. As a consequence. 




Xi{x^ (li{3pj ‘ di di ■ li{xf^ — 0 . 



Therefore, X^peP, constant. 



□ 



y ■ l{y) > X ■ l{x) -f y ■ l{y) - y ■ l{x) 



= X ■ l{x) + yp{lp{y) - lp{x)) 



peP 



= X ■ l{x) + Y ye{le{y) - le{x)). 



eeE 
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Consider an edge e € E. There are three cases. 

1. He > Xe- Because of strict monotonicity of le, it holds that le{y) > le{x). 
Therefore also ye{le{y) ~ le{x)) > Xe{le{y) ~ le{x)). 

2. Ue < Xe- Because of strict monotonicity of le, it holds that le{y) < le{x). 
Again, Veileiv) ~ kix)) > Xeileiy) - le{x))- 

3. ye = Xe- In that case ye{le{y) ~ le{x)) = Xeikiv) - le{x))- 

There is at least one edge e G E with Xe yf ye and, therefore, ye{le{y) — le{x)) > 
Xe{L{y) — le{x))- Altogether we have 

y ■ l{y) > X ■ l{x) + ^ XeHeiy) - le{x)) = X - l{y) 

eeE 

which is our claim. Note that this proof immediately covers the multicommodity 
case. □ 



Proof (of Proposition 3)- Denote the Nash equilibrium by x and the current 
population by y- We define a potential function by the entropy 

Hx{y) ■= V] Xpln^. 

From information theory it is known that this function always exceeds the 
square of the Euclidean distance ||£C — y|p. We can also write this as H^iy) = 
In(xp) — Xpln(j/p)). Using the chain rule we calculate the derivative 
with respect to time: 

^x{y) ^ ^ Xpijp . 

pep y^ 

Now we substitute the replicator dynamics for ijp, cancelling out the yp. 



Hx{y) 



iei pGPi 

E xplp{y) - k{y) ■ di 

zGl \pGPi 



^^y"> E E ^p^p(y) ~ ^ E yp^p^y) 



zGl \p&Pi 



p6Pi 



^(y)E E(^p“ 2 ^p) 

iei pePi 

\{y) -{x-y)- l{y). 




Since x is a Wardrop equilibrium. Proposition 2 implies that {x — y) - l{y) < 0. 
Furthermore, by our assumption A(x) > e > 0. Altogether this implies that 
Hx{y), and therefore also \\x — y\\^ decreases towards the lower bound of El^, 
which is 0. □ 
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We will now proof the bound on the time of convergence in the symmetric case. 

Proof (of Theorem 1). For our proof we will use a generalisation of Rosenthal’s 
potential function [12]. In the discrete case, this potential function inserts the 
agents sequentially into the network and sums up the latencies they experience 
at the point of time they are inserted. In the continuous case this sum can be 
generalised to an integral. As a technical trick, we furthermore add the average 
latency at a social optimum l*\ 

<P{x) := j le{x) dxj +/*. (4) 

Now we calculate the derivative with respect to time of this potential <P. Let Le 
be an antiderivative of Ig- 



^ ^ Le(Xe) — ^ ^ Xg * ^e{Xg) — ^ ^ ^ ^ Xp * lg{Xgf 
e^E e^E e^E p3e 

Now we substitute the replicator dynamics (2) into this equation and obtain 



^ ^(A(a:) • Xp ■ (l{x) - lp{x))) ■ lg{xg) 

e£E p^e 

— A(!u) ^ ^ ^ ^ Xp ’ 

p^P e£p 

= X{x) ^ Xp ■ ([{x) - lp{x)) ■ lp{x) 

pGP 

— A(x) I l{xj ^ ^ Xplp{x^ ^ ^ Xplpfx^ 1 

y pGP peP / 

= A(a;) I l{xY - ^ Xplp{xf 

\ P6P 




( 5 ) 



By Jensen’s inequality this difference is negative. 

As long as we are not at an e-approximate equilibrium, there must be a 
population share of magnitude at least e with latency at least (1 + e) • l{x). For 
fixed l{x) the term X^peP ^p^p(®)^ is minimal when the less expensive paths all 
have equal latency V . This follows from Jensen’s inequality as well. We have 
r= e • (1 + e) • r+ (1 — e) • Z' and 



I' = 1- 




(6) 



According to equation (5) we have 

d> = \{x) . {T{xf - (e • ((1 + e) . T{x)f + (I - e) • P)). 



( 7 ) 



332 S. Fischer and B. Vocking 



Substituting (6) into (7) and doing some arithmetics we get 

•P = —\{x )— — 

1 — e 

< — A(a;) • • l{xY /2 = — • l{x)/2. (8) 

We can bound I from below by P/2: 

/(x) — ^ ^ Xplpi^X^ — ^ ^ ^ ^ Xplei^Xe'j 
p£P p£P e£p 

e£E p9e e£E 

- 

eeE'^° 

The inequality holds because of monotonicity of the latency functions. By defi- 
nition of I*, also I > I* . Altogether we have I +l > I* + Esge Io" ^e(x) dx, or 
I > P/2. Substituting this into inequality (8) we get the differential inequality 

P < -e^P/A 

which can be solved by standard methods. It is solved by any function 

P{t) < P^uite~'^^^'*■ 

where Pmit = ^(0) is given by the boundary conditions. This does only hold 
as long as we are not at a e-approximate equilibrium. Hence, we must reach a 
e-approximate equilibrium at the latest when P falls below its minimum P* . We 
find that the smallest t fulfilling P{t) < P* is 

I ^ 1 d^init 

t = 4e In . 

P* 

Clearly, P* > I* and Pinit < 2 • Imax which establishes our assertion. □ 

Where is this proof pessimistic? There are only two estimates. We will give an 
example where these inequalities almost hold with equality. 

1. Inequality (9) holds with equality if we use constant latency functions. 

2. The considerations leading to equation (7) are pessimistic in that they as- 
sume that there are always very few agents in and that they are always 
close to 1. 

Starting from these observations we can construct a network in which we can 
prove our lower bound. 

Proof (of Theorem 2, Sketch). Our construction is by induction. Consider a 
network with m parallel links numbered 0 through to — 1. We show that the 
time of convergence for m links is at least (to — 1) • I7(e“^). 
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For * € {1 . . . , m} define the (constant) latency functions k{x) = (l+ce)“*+^, 
c a constant large enough. Initially, some agents are on the most expensive link 
1, very few agents are on links 3, . . . ,m, and the majority is on link 2. More 
precisely, Xi(0) = 2e, X 2 ( 0 ) = 1 — 2e — 7 , and 3;z(0) = 7 , where 7 is some 
small constant. In the induction step we will define the values of the Xi(0) for 
i > 2. Initially, assume 7 = 0 . 

First consider the case where m = 2. Clearly, the average latency I is dom- 
inated by link 2, i. e., li is by a factor of at least (1 -I- e) more expensive than 
1. This implies that we cannot be at an e-approximate equilibrium as long as 
Xi{t) > e. Since link 1 is by a factor of 0(1 -I- e) more expensive than I we have 
X\ = — 0 (e^) and it takes time D{e~^) for x\ to decrease from 2 e to e. 

Now consider a network with m > 2 edges. By induction hypothesis we know 
that it takes time tm = {m — 2) ■ f7(e“^) for the agents to shift their load to link 
m — 1. We carefully select Xm(0) such that it fulfils the following conditions: 

— For t < tm the population share on link m does not lower I significantly such 
that our assumptions on the growth rates still hold. 

— For t = tm, Xm{t) exceeds a threshold such that I decreases below (1+e) • 

Although we do not calculate it, there surely exists a boundary condition for 
Xm{Q) having these properties. Because of the second condition, the system ex- 
actly fails to enter an e-approximate equilibrium at time tm- Because of this, we 
must wait another phase of length I7(e“^) for the agents on link m — 1 to switch 
to link TO. Note that there may be agents on link to — 2 and above. These move 
faster towards cheaper links, but this does not affect our lower bound on the 
agents on link to — 1 . 

We have /max = 1 and 1* = (1-1- ce)“"*+^. For arbitrary ratios r = Inmx/l* 
we choose to = ln(r) yielding a lower bound of I7(e“^ • to) = f2(e~^ ■ Inr) on the 
time of convergence. □ 

Proof (of Theorem 3). In the multicommodity case, equation (5) takes the form 



iei y pGPi 

Let i* = argmin^gx h- In the worst case, all agents in Xe belong to commodity i* 
and equation ( 8 ) takes the form /2. Then Theorem 3 follows directly 

by the trivial estimate k* > I*. □ 

6 Conclusions and Open Problems 

We introduced the replicator dynamics as a model for the dynamic behaviour of 
selfish agents in networks. For the symmetric case we have given an essentially 
tight bound for the time of convergence of this dynamics that is polynomial in 
the degree of approximation and logarithmic in network parameters. For the 
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multicommodity case, we derived an upper bound which is linear in the net- 
work parameters. This model can also be used in the design of distributed load 
balancing algorithms, since one can reasonably assume that an algorithm based 
on this model would be accepted by network users. Several interesting problems 
remain open: 

— The replicator dynamics is based on random experiments performed by 
agents playing against each other. By this “fluid limit” we can ensure that 
the outcomes of the experiments meet their expectation values. What hap- 
pens if we go back to the discrete process? 

— How can one improve the upper bound in the multicommodity scenario? 

— There is a delay between the moment the agents observe load and latency 
in the network and the moment they actually change their strategy. What 
effects does the use of old information have? Similar questions are, e.g., 
studied in [10]. 

— A question of theoretical interest is: What is the convergence time for the 
final Nash equilibrium? 
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Abstract. How efficiently can we search an unknown environment for 
a goal in unknown position? How much would it help if the environment 
were known? We answer these questions for simple polygons and for 
general graphs, by providing online search strategies that are as good 
as the best offline search algorithms, up to a constant factor. For other 
settings we prove that no such online algorithms exist. 

1 Introduction 

One of the recurring tasks in life is to search one’s environment for an object 
whose location is — at least temporarily — unknown. This problem comes in dif- 
ferent variations. The searcher may have vision, or be limited to sensing by touch. 
The environment may be a simple polygon, e. g., an apartment, or a graph, like 
a street network. Finally, the environment may be known to the searcher, or be 
unknown. 

Such search problems have attracted a lot of interest in online motion plan- 
ning, see for example the survey by Berman [4]. Usually the cost of a search is 
measured by the length of the search path traversed; this in turn is compared 
against the length of the shortest path from the start position to the point where 
the goal is reached. If we are searching in an unknown environment, the max- 
imum quotient, over all goal positions and all environments, is the competitive 
ratio of the search algorithm. 

Most prominent is the problem of searching two half-lines emanating from a 
common start point. The “doubling” strategy visits the half-lines alternatingly, 
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many/Hong Kong Joint Research Scheme sponsored by the Research Grants Goun- 
cil of Hong Kong and the German Academic Exchange Service (Project No. G- 
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Council of the Hong Kong Special Administrative Region, China (Project 
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each time doubling the depth of exploration. This way, the goal point is reached 
after traversing a path at most 9 times as long as its distance from the start, 
and the competitive ratio of 9 is optimal for this problem; see Baeza-Yates et 
al. [3] and Alpern and Gal [2]. This doubling approach frequently appears as a 
subroutine in more complex navigation strategies. 

In searching m > 2 half-lines, a constant ratio with respect to the distance 
from the start can no longer be achieved. Indeed: Even if the half lines were 
replaced by segments of the same finite length, the goal could be placed at 
the end of the segment visited last, causing the ratio to be at least 2m — 1. 
Exponentially increasing the exploration depth by m/m — 1 is known to lead to 
an optimal competitive ratio of C{m) = 1-1- 2m <1-1- 2me. 

Much less is known about more realistic settings. Suppose a searcher with vi- 
sion wants to search an unknown simple polygon for a goal in unknown position. 
He could employ the m— way technique from above: By exploring the shortest 
paths from the start to the m reflex vertices of the polygon — ignoring their tree 
structure — a competitive ratio of C(m) can easily be achieved [13]. Schuierer [15] 
has refined this method and obtained a ratio of C(2fc), where k denotes the small- 
est number of convex and concave chains into which the polygon’s boundary can 
be decomposed. 

But these results do not completely settle the problem. For one, it is not clear 
why the numbers m or k should measure the difficulty of searching a polygon. 
Also, human searchers easily outperform m— way search, because they make 
educated guesses about the shape of those parts of the polygon not yet visited. 

In this paper we take the following approach: Let tt be a search path for the 
fixed polygon P, i. e., a path from the start point, s, through P from which each 
point p inside P will eventually be visible. Let rise,r(p) be the first point on tt 
where this happens. The cost of getting to p via tt equals the length of tt from s to 
rise,r(p), plus the Euclidean distance from rise,r(p) to p. We divide this value by 
the length of the shortest s— to— p path in P. The maximum of these ratios, over 
all p € P, is the search ratio of tt. The lowest search ratio possible, over all search 
paths, is the optimum search ratio of P; it measures the “searchability” of P. 

Koutsoupias et al. [14] studied graphs with unit length edges where the goal 
can only be located at vertices, and they only studied the offline case, i. e., with 
full a priori knowledge of the graph. They showed that computing the optimal 
search ratio offline is an NP-complete problem, and gave a polynomial time 
8-approximation algorithm based on the doubling heuristic. 

The crucial question we are considering in this paper is the following: Is 
it possible to design an online search strategy whose search ratio stays within 
a constant factor of the optimum search ratio, for arbitrary instances of the 
environment? Surprisingly, the answer is positive for simple polygons as well as 
for general graphs. (However, for polygons with holes, and for graphs with unit 
edge length, where the goal positions are restricted to the vertices, no such online 
strategy exists.) 

Note that search ratio and competitive ratio have very similar definitions, 
but they are actually rather different concepts. For the competitive ratio, an 
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online search algorithm has no idea how the environment looks like and has to 
learn it while searching for the goal. In contrast, the search ratio is defined for 
a given fixed environment. Since the optimal search ratio path minimizes the 
quotient of search distance over shortest distance to the goal, the optimal search 
ratio is actually a lower bound for the competitive search ratio of any online 
search algorithm. Computing online a c-approximation of the optimal search 
ratio path means we compute online a search path whose competitive ratio is 
at most a factor of c worse than the optimal competitive ratio of any online 
search algorithm, but that does not tell us anything about the competitive ratio 
itself which could be arbitrarily bad. In some special cases, we can search with a 
constant competitive ratio, for example on the line optimally 9-competitive (the 
Lost-Cow problem [3]) and in street polygons optimally -v/^-competitive [12,16]. 

The search strategies we will present use, as building blocks, modified versions 
of constant-competitive strategies for online exploration, namely the exploration 
strategy by Hoffmann et al. [11] for simple polygons, and the tethered graph 
exploration strategy by Duncan et al. [9]. 

At first glance it seems quite natural to employ an exploration strategy in 
searching — after all, either task involves looking at each point of the environ- 
ment. But there is a serious difference in performance evaluation! In searching 
an environment, we compete against shortest start-to-goal paths, so we have to 
proceed in a BPS manner. In exploration, we are up against the shortest round 
trip from which each point is visible; this means, once we have entered some 
remote part of the environment we should finish it, in a DPS manner, before 
moving on. However, we can fit these exploration strategies to our search prob- 
lem by restricting them to a bounded part of the environment. This will be 
shown in Section 3 where we present our general framework, which turns out to 
be quite elegant despite the complex definitions. The framework can be applied 
to online and offline search ratio approximations. In Section 2 we review basic 
definitions and notations. Our framework will then be applied to searching in 
various environments like trees, (planar) graphs, and (rectilinear) polygonal en- 
vironments with and without holes in Sections 4 and 5. Pinally, in Section 6, we 
conclude with a summary of our results (see also Table 1). 

2 Definitions 

We want to find a good search path in some given environment £. This might 
be a tree, a (planar) graph, or a (rectangular) polygon with or without holes. 
In a graph environment, the edges may either have unit length or arbitrary 
length. Edge lengths do not necessarily represent Euclidean distances, not even 
in embedded planar graphs. In particular, we do not assume that the triangle 
inequality holds. 

The goal set, Q, is the set of locations in the environment where the (station- 
ary!) goal might be hidden. If £1 is a graph G = (V,E), then the goal may be 
located on some edge {geometric search), i. e., G = V U E, or its position may 
be restricted to the vertices {vertex search), i.e., Q = V. To explore £ means to 



338 



R. Fleischer et al. 



Table 1. Summary of our approximation results, where a > 0 is an arbitrary 
constant. The entry marked with * had earlier been proven by Koutsoupias et al. [14]. 
They had also shown that computing the optimal search path is NP-complete for 
(planar) graphs. It is also NP-complete for polygons, whereas it is not known to be 
NP-complete for trees. 



Environment 


Edge length 


Goal 


Polytime approximation ratio 

Online [Offline 


Tree 


unit, arbitrary 


vertex, geometric 


4 


4 


Planar graph 


arbitrary 


vertex 


no search-competitive alg. 


8 


Planar graph 


unit 


vertex 


104 + 40a -1- ^ 


4 


General graph 


unit 


vertex 


no search-competitive alg. 


8* 


General graph 


arbitrary 


geometric 


48 -f- 16 q + ^ 


4 


Simple polygon 




212 


8 


Rect. simple polygon 




8n/2 


8 


Polygon with rect. holes 




no search-competitive alg. 


? 



move around in £ until all potential goal positions Q have been seen. To search 
£ means to follow some exploration path in £, the search path, until the goal 
has been seen. We assume that all search and exploration paths return to the 
start point, s, and we make the usual assumption that goals must be at least a 
distance 1 away from the start point. ^ 

For d > 1, let £{d) denote the part of £ in distance at most d from s. A 
depth-d restricted exploration explores all potential goal positions in Q{d). The 
exploration path may move outside £{d) as long as it stays within £. Depth-d 
restricted search is defined accordingly. 

It remains to define what it means that the searcher “sees” the goal. In graph 
searching, agents are usually assumed to be blind, i. e., standing at a vertex of 
a directed graph the agent sees the set of outgoing edges, but neither their 
lengths nor the position of the other vertices are known. Incoming edges cannot 
be sensed; see [7]. Blind agents must eventually visit all points in the goal set. 

Polygon searchers are usually assumed to have vision, that is, they always 
know the current visibility polygon. Such agents need not visit a goal position 
if they can see it from somewhere else. Searching a polygon means to visit all 
visibility cuts, i.e., all rays extending the edges of the reflex vertices. Actually, 
there is always a subset of the cuts, the essential cuts, whose visit guarantees 
that all other cuts are also visited on the way. 

In case of online algorithms, we assume that agents have perfect memory. 
They always know a map of the part of £ already explored, and they can always 
recognize when they visit some point for the second time, i. e., they have perfect 
localization (the robot localization problem is actually a difficult problem by 
itself, see for example [10]). 



^ If goals could be arbitrarily close to s, no algorithm could be competitive. 
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We now introduce a few notations. Let tt be a path in the environment £ . 
For a point p € tt let 7r(p) denote the part of tt between s and p, and sp(p) the 
shortest path from s to p in f . We denote the length of a path segment 7r(p) by 
|7t(p)|. Paths computed by some algorithm A will be named A^ too. For a point 
p € £ let rise^ (p) denote the point q € tt from which p is seen for the first time 
when moving along tt starting at s, see Fig. 1. 




Fig. 1. A search path tt in a polygon, visiting all essential cuts (the dotted lines). The 
dashed path is the shortest path sp{p) from s to the goal p. Moving along tt, p can first 
be seen from q = rise.,r(p). 



The quality of a search path is measured by its search ratio sr(7r), defined as 
sr(7r) := max sp(pj'|^^ i where q = rise,r(p)- Note that q = p for blind agents. 

An optimal search path, TTopt, is a search path with a minimum search ratio 

SI’opt ~ Sl’('^opt)- 

Since the optimal search path seems hard to compute [14], we are interested 
in finding good approximations of the optimal search path, in offline and online 
scenarios. We say a search path tt is C- search- competitive (with respect to the 
optimal search path TTopt) if sr(7r) < C-sr(7Topt). Note that tt is then a C-sr(7Topt)- 
competitive search algorithm (in the usual competitive sense) . 



3 A General Approximation Framework 

In this section we will show how to transform an exploration algorithm, offline or 
online, into a search algorithm, without losing too much on the approximation 
factor. 

Let £ be the given environment and TTopt optimal search path. We assume 
that, for any point p, we can reach s from p on a path of length at most sp(p).^ 
For d> 1, let Expl{d) be a family of depth-d restricted exploration algorithms 
for £, either online or offline. Let OPT and OPT(d) denote the corresponding 
optimal offline depth-d restricted exploration algorithms. 

^ Note that this is not the case for directed graphs, but it holds for undirected graphs 
and polygonal environments. We will see later that there is no constant-competitive 
online search algorithm for directed graphs, au3rway. 
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Definition 1. The family Expl{d) is DREP (depth restricted exploration prop- 
erty) if there are constants /3 > 0 and C/j > 1 such that, for any d > 1, Expl{d) 
is C, 3 -competitive against the optimal algorithm OPT(/3d), i. e., \Expl{d)\ < 

I OPT (/3d) I . □ 

In the normal competitive framework we would compare Expl{d) to the op- 
timal algorithm OPT(d), i. e., /3 = 1. As we will see later, our more general 
definition makes it sometimes easier to find DREP exploration algorithms. Usu- 
ally, we cannot just take an exploration algorithm Expl for £ and restrict it to 
points in distance at most d from s. This way, we might miss useful shortcuts 
outside of £{d). Even worse, it may not be possible to determine in an online set- 
ting which parts of the environment belong to £{d), making it difficult to explore 
the right part of £. In the next two sections we will derive DREP exploration 
algorithms for graphs and polygons by carefully adapting existing exploration 
algorithms for the entire environment. 

To obtain a search algorithm for £ we use the doubling strategy. For i = 
1,2,3,..., we successively run the exploration algorithm Expl(2’‘), each time 
starting at s. 

Theorem 1. The doubling strategy based on a DREP exploration strategy is a 
4:130 p-search-competitive (plus an additive constant) search algorithm for blind 
agents, and a Sf3C p-search-competitive (plus an additive constant) search algo- 
rithm for agents with vision. 



Proof. Consider one iteration of the doubling strategy with search radius d > 1. 
Let last(d) be the point on the optimal search path TTopt for £ from which we 
see the last point in distance at most d from s when moving along TTopt. If 
the agent has vision, last(d) could lie outside of £{d). Note that |OPT(d)| < 
|7Topt(last(d))| -I- I sp(last(d))|. For a blind agent we have sp(last(d)) < d. Thus, 
> TTopt (last(d)) > lOPT^OM^ |OPT(d)| < d • (sropt -k 1). If the goal is in 
distance 2-' -|-e for some small e > 0, then the search ratio of the doubling strategy 



\Expl{2*)\ |0PT(/32L| ^ „ 

23 S 0/3 ■ 23 — 0/3 



/32-.(sropt + l) 



< 



y-'j+i 

is bounded by — 

4pC/3 ■ (sropt -k 1). 

If the agent has vision, we only know that | sp(last(d))| < |7Topt(last(d))|. 
Thus, sropt > or |OPT(d)| < 2d • sropt. So in this case 



we can bound the search ratio by 






23 



< 1 -k SpC /3 ■ sropt. 



□ 



The only problem is now to find good DREP exploration algorithms for 
various environments. 



4 Searching Graphs 

We distinguish between graphs with unit length vs. arbitrary length edges, pla- 
nar vs. non-planar graphs, directed vs. undirected graphs, and vertex vs. geo- 
metric search. We assume agents are blind, i. e., they can at any vertex only 
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sense the number of outgoing edges but they cannot sense the incoming edges 
and the endpoints of the outgoing edges. In the vertex search problem, we as- 
sume w.l.o.g. that graphs do not have parallel edges. Otherwise, there can be no 
constant-search-competitive vertex search algorithm. In Fig. 2(iv), the optimal 
search path s t —>■ s has length 3, whereas any online search path can 

be forced to cycle often between s and v before traversing the edge v ^ t. Note 
that we also could use undirected edges. 



4.1 Non-competitiveness Results 

We first show that for many variants there is no constant-search-competitive 
online search algorithm. Incidentally, there is also no constant-competitive online 
exploration algorithm for these graph classes. Note that non-search-competitive- 
ness for planar graphs implies non-search-competitiveness for general graphs, 
non-search-competitiveness for unit length edges implies non-search-competitive- 
ness for arbitrary length edges, and non-search-competitiveness for undirected 
graphs implies non-search-competitiveness for directed graphs (we could replace 
each undirected edge with directed edges in both directions). 

Theorem 2. For blind agents, there is no constant-search-competitive online 
vertex search algorithm for (i) non-planar graphs, (ii) directed planar graphs, 
(Hi) planar graphs with arbitrary edge lengths. Further, there is no constant- 
search- competitive online geometric search algorithm for directed graphs with 
unit length edges. 

Proof. It is not difficult to verify the claims on the graphs in Fig. 2(i)-(iii). □ 





4.2 Competitive Search in Graphs 

In this subsection we will present search-competitive online and offline search 
algorithms for the remaining graph classes. We assume in this subsection that 
graphs are always undirected. 
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Trees. On trees, DFS is a 1-competitive online exploration algorithm for vertex 
and geometric search that is DREP; it is still 1-competitive when restricted to 
search depth d, for any d > 1. Thus, the doubling strategy gives a polynomial 
time 4-search-competitive search algorithm for trees, online and offline. On the 
other hand, it is an open problem whether the computation of an optimal vertex 
or geometric search path in trees with unit length edges is NP-complete [14]. 

Competitive Graph Search Algorithms. We will now give competitive 
search algorithms for planar graphs with unit length edges (vertex search) and 
for general graphs with arbitrary length edges (geometric search). Both algo- 
rithms are based on an online algorithm for online tethered graph exploration. 

In the tethered exploration problem the agent is fixed to the start point by a 
restricted length rope. An optimal solution to this problem was given by Duncan 
et al. [9]. Their algorithm can explore an unknown graph with unit length edges 
in 2\E\ -I- (4 -I- ^)|P| edge traversals, using a rope length of (1 -I- a)d, where 
d is the distance of the point farthest away from the start point and a > 0 is 
some parameter. As they pointed out, the algorithm can also be used for depth 
restricted exploration. To explore all edges in G((l -I- a)d), for d > 1, their 
algorithm uses at most 2|£'((1 -|- a)d)| -I- (4 -|- ^)|P((1 + a)d)\ edge traversals 
using a rope of length (l-|-a)d. Let us call this algorithm Expl{d). The algorithm 
explores the graph in a mixture of bounded-depth DFS on G, DFS on some 
spanning tree of G, and recursive calls to explore certain large subgraphs. The 
bound for the number of edge traversals can intuitively be explained as follows: 
bounded-depth DFS visits each edge at most twice, thus the term 2|if((l-|-a)d)|; 
DFS on the spanning tree visits every node at most twice, but nodes can be in 
two overlapping spanning trees, thus the term 4|R((l-|-a)d)|; relocating between 
recursive calls does not happen too often (because of the size of the subgraphs) , 
giving the term ^|E((1 -|- a)d)|. 

We note that Expl{d) can be modified to run on graphs with arbitrary length 
edges. Then we do not bound the number of edge traversals, but the total length 
of all traversed edges. Let length{E) denote the total length of all edges in E. 
It is possible to adapt the proofs in [9] to prove the following lemma. 

Lemma 1. In graphs with arbitrary length edges, Expl{d) explores all edges 
and vertices in G(d) using a rope of length (1 -|- a)d at a cost of at most (4 -|- 
|) ■ length{E{{! + a)d)). 

Proof. (Sketch) Intuitively, DFS and bounded-depth DFS traverse each edge 
at most twice, thus the term 4-length{E{{l + a)d)); relocating between recursive 
calls does not happen too often and subgraphs do not overlap (at least not their 
edges), thus the term ^ • length{E{{l + a)d)). □ 

Lemma 2. In planar graphs with unit length edges, Expl(d) is a DREP online 
vertex exploration algorithm with /3 = 1 + a and Gg = 10 -I- 

Proof, if ((1 -I- a)d) < 3E((1 -I- a)d) — 6 by Euler’s formula. Thus, the number of 
edge traversals is at most (10-1- ^)E((H-a)c?). On the other hand, OPT((l-|-a)(i) 
must visit each vertex in E((l -I- a)d) at least once. □ 
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Theorem 3. The doubling strategy based on Expl{d) is a (104 + 40a + ^)~ 
search- competitive online vertex search algorithm for blind agents in planar 
graphs with unit length edges. The competitive ratio is minimal for a = \J~^- O 

Lemma 3. In general graphs with arbitrary length edges, Expl{d) is a DREP 
online geometric exploration algorithm with /J = 1 + a and Cp = 4 + £ . 

Proof. The total cost of Expl{d) is at most (4+ £) • length{E({l-\-a)d)) by Lem- 
ma 1. On the other hand, OPT((l-|-a)d) must traverse each edge in if((l-|-a)(i) 
at least once. □ 

Theorem 4. The doubling strategy based on Expl{d) is a (48 -I- 16a -I- ^)- 
search-competitiveonline geometric search algorithm for blind agents in general 
graphs with arbitrary length edges. □ 



5 Searching Polygons 

5.1 Simple Polygons 

A simple polygon P is given by a closed non-intersecting polygonal chain. We 
assume that agents have vision. To apply our framework we need a DREP online 
exploration algorithm Expl^^fd). 





Fig. 3. (i) PE{d) explores the left reflex vertex vi along a circular arc (1), returns 

to the start (2) and explores the right reflex vertex Vr likewise (3) + (4). On the other 
hand, the shortest exploration path for P{d) in P, the dashed line, leaves P{d). But 
we can extend P{d) by the circular arc. (ii) PE{d) leaves P{d) whereas the shortest 
exploration path for P{d) lies inside P{d). 



The only known algorithm, PE, for the online exploration of a simple polygon 
by Hoffmann et al. [11] achieves a competitive ratio of 26.5. Now we must adapt 
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this algorithm to depth restricted exploration. The undetected parts of P always 
lie behind cuts c„ emanating from reflex vertices v. These reflex vertices are 
called unexplored as long as we have not visited the corresponding cut c„. We 
modify PE so that it explores P only up to distance d from s. The algorithm 
always maintains a list of unexplored reflex vertices and successively visits the 
corresponding cuts. While exploring a reflex vertex (along a sequence of line 
segments and circular arcs) , more unexplored reflex vertices may be detected or 
unexplored reflex vertices may become explored. These vertices are inserted into 
or deleted from the list, respectively. In PE{d), unexplored reflex vertices in a 
distance greater than d from the start will be ignored, i. e., although they may 
be detected they will not be inserted into the list. Let OPT(P, d) be the shortest 
path that sees all points in P{d). 

Note that OPT(P, d) and PE{d) may leave P(d), see Fig. 3. Nevertheless, 
it is possible to adapt the analysis of Hoffmann et al. There are actually two 
ways to do this. Either we enlarge P{d) without creating new reflex vertices 
such that the larger polygon contains OPT(P, d) and PE{d). Or we redo the 
analysis of Hoffmann et al. in P restricted to OPT(P, d) and PE{d), which is 
possible. In the former case we can use some sort of extended boundary around 
the boundary of P(d) such that the enlarged polygon contains PE{d) and 
OPT(P, d), see the convex enlargements in Fig. 3. It may happen that these 
new extensions overlap for different parts of P(d). However, this does not affect 
the analysis of Hoffmann et al. 

Lemma 4. In a simple polygon, PE(d) is a DREP online exploration algorithm 
with P = 1 and Cp = 26.5. □ 

Theorem 5. The doubling strategy based on PE{d) is a 212-search-competitive 
online search algorithm for an agent with vision in a simple polygon. There is 
also a polynomial time S- search- competitive offline search algorithm. 

Proof. The online search-competitiveness follows from Lemma 4 and Theorem 1. 

If we know the polygon, we can compute OPT(P, d) in polynomial time by 
adapting a corresponding algorithm for P. Every known polynomial time offline 
exploration algorithm builds a sequence of the essential cuts, see for example [5, 
18,17,8]. Any of these algorithms could be used in our framework. Since an 
optimal algorithm has approximation factor C = 1, our framework yields an 
approximation of the optimal search ratio within a factor of 8. 

If we skip a step with distance 2® if there is no reflex vertex within a distance 
between 2®“^ and 2®, the total running time is bounded by a polynomial in the 
number of the vertices of P. □ 

Now the question arises whether there is a polynomial time algorithm that 
computes the optimal search path in a simple polygon. All essential cuts need 
to be visited, so we can try to visit them in any possible order. However, we do 
not know where exactly we should visit a cut. We are not sure whether there are 
only a few possibilities (similar to the shortest watchman route problem), i. e.. 
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whether this subproblem is discrete. So the problem of efficiently computing an 
optimal search path in a polygon is still open. 

For rectilinear simple polygons we can find better online algorithms based on 
a -\/2-competitive online exploration algorithm by Papadimitriou et al. [6] which 
can be made DREP by ignoring reflex vertices farther away than d. Again, no 
polynomial time algorithm for the optimal search path is known. 

Theorem 6. For an agent with vision in a simple rectilinear polygon there is a 
S\/2- search- competitive online search algorithm. There is also a polynomial time 
S- search- competitive offline search algorithm. □ 

5.2 Polygons with Holes 

We will show that there is no constant-search-competitive online search algo- 
rithm for polygons with rectangular holes. It was shown by Albers et al. [1] that 
there is no constant-competitive online exploration algorithm for polygons with 
rectangular holes. For k > 2, they filled a rectangle of height k and width 2k 
with 0{ff) rectangular holes such that the optimal exploration tour has length 
0{k), whereas any online exploration algorithm needs to travel a distance of 
The details of the construction are not important here. We just note that 
it has the property that any point p is at most at distance 3k from the start 
point, which is in the lower left corner of the bounding rectangle. 

Theorem 7. For an agent with vision in a polygon with rectangular holes there 
is no constant-search-competitive online search algorithm. 

Proof. We extend the construction of Albers et al. by making the bounding 
rectangle larger and placing a new hole of height k and width 2k just below all 
the holes of the previous construction. The new start point s is again in the 
lower left corner of the bounding rectangle. Any point that is not immediately 
visible from s has at least distance k from s. □ 

Since the offline exploration problem is NP-complete (by straightforward 
reduction from planar TSP) we cannot use our framework to obtain a polynomial 
time approximation algorithm of the optimal search path. However, there is an 
exponential time 8-approximation algorithm. We can list the essential cuts of 
OPT(P, d) in any order to find the best one. Application of our framework then 
gives an approximation factor of 8 for the optimal search ratio. 

The results of Koutsoupias et al. [14] imply that the offline problem of com- 
puting an optimal search path in a known polygon with (rectangular) holes is 
NP-complete. 

6 Conclusion and Open Problems 

We have introduced a framework for computing online and offline approxima- 
tions of the optimal search path graph and polygonal environments. We have 
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obtained fairly simple proofs of the existence of approximation strategies and 
their approximation factors, although the factors are quite high. We have also 
shown that some environments do not have constant-search-competitive online 
search strategies. Our framework would also work for randomized algorithms, 
but there are not many randomized exploration algorithms around. For some of 
the settings it remains open whether the offline optimization problem is NP-hard. 
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Abstract. In the incremental versions of Facility Location and k- 
Median, the demand points arrive one at a time and the algorithm must 
maintain a good solution by either adding each new demand to an ex- 
isting clnster or placing it in a new singleton cluster. The algorithm can 
also merge some of the existing clusters at any point in time. We present 
the first incremental algorithm for Facility Location with uniform facility 
costs which achieves a constant performance ratio and the first incremen- 
tal algorithm for fc-Median which achieves a constant performance ratio 
using 0(fc) medians. 



1 Introduction 

The model of incremental algorithms for data clustering is motivated by prac- 
tical applications where the demand sequence is not known in advance and the 
algorithm must maintain a good clustering using a restricted set of operations 
which result in a solution of hierarchical structure. Charikar et al. [3] intro- 
duced the framework of incremental clustering to meet the requirements of data 
classification applications in information retrieval. 

In this paper, we consider the incremental versions of metric Facility Location 
and /c-Median. In Incremental k-Median [5], the demand points arrive one at a 
time. Each demand must be either added to an existing cluster or placed in a new 
singleton cluster upon arrival. At any point in time, the algorithm can also merge 
some of the existing clusters. Each cluster is represented by its median whose 
location is determined at the cluster’s creation time. When some clusters are 
merged with each other, the median of the new cluster must be selected among 
the medians of its components. The goal is to maintain a solution consisting 
of at most k clusters/medians which minimize the total assignment cost of the 
demands considered so far. The assignment cost of a demand is its distance from 
the median of the cluster the demand is currently included in. 

* This work was partially supported by the FET programme of the EU under contract 
numbers IST-1999-14186 (ALCOM-FT) and IST-2001-33116 (FLAGS). Part of this 
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Saarbriicken, Germany. 
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In Incremental Facility Location, demand points arrive one at a time and 
must be assigned to either an existing or a new facility upon arrival. At any 
point in time, the algorithm can also merge a facility with another one by closing 
the first facility and re-assigning all the demands currently assigned to it to the 
second facility. The objective is to maintain a solution which minimizes the sum 
of facility and assignment costs. The facility cost only accounts for the facilities 
currently open, while the assignment cost of a demand is its distance from the 
facility the demand is currently assigned to. 

Both problems are motivated by applications of clustering in settings that the 
demands arrive online (e.g., clustering the web, data mining) and the algorithm 
must maintain a good solution without resorting to complete re-clustering. To 
avoid cluster proliferation, the algorithm is allowed to merge existing clusters. In 
Incremental fc-Median, the number of clusters is fixed in advance. On the other 
hand. Incremental Facility Location provides a model for applications where 
the number of clusters cannot be determined in advance due to limited a priori 
information about the demand sequence. Hence, we introduce a uniform facility 
cost enforcing the requirement for a relatively small number of clusters. 

We evaluate the performance of incremental algorithms using the perfor- 
mance ratio [3], whose definition is essentially identical to that of the competitive 
ratio (e.g., [2]). An incremental algorithm achieves a performance ratio of c if 
for all demand sequences, the algorithm’s cost is at most c times the cost of an 
optimal offline algorithm, which has full knowledge of the demand sequence, on 
the same instance. We also let n denote the total number of demands. 

Comparison to Online and Streaming Algorithms. Similarly to online 
algorithms, incremental algorithms commit themselves to irrevocable decisions 
made without any knowledge about future demands. More specifically, when a 
new demand arrives, the algorithm may decide to add the demand to an existing 
cluster or merge some clusters with each other. These decisions are irrevocable 
because once formed, clusters cannot be broken up. However, we are not aware 
of any simple and natural notion of irrevocable cost that could be associated 
with the decision that some demands are clustered together. 

Incremental /c-Median can be regarded as the natural online version of k- 
Median. Nevertheless, Incremental Facility Location is quite different from the 
problem of Online Facility Location (OFL) introduced in [10]. In particular, 
OFL is motivated by network design applications where re-configurations are 
expensive and often infeasible. Thus, the decisions of opening a facility at a 
particular location and assigning a demand to a facility are irrevocable as well 
as the corresponding costs. On the other hand. Incremental Facility Location 
is motivated by clustering applications where merging existing clusters is not 
only feasible but also desirable and the important restriction is that existing 
clusters should not be broken up. Hence, only the decision that some demands are 
clustered together is irrevocable. As a result, the competitive ratio is Q( io'giogn ) 
for OFL [6] and constant for Incremental Facility Location. 

Incremental algorithms also bear a resemblance to one-pass streaming algo- 
rithms for clustering problems (e.g., [9] for a formal definition of the streaming 
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model of computation). In case of streaming algorithms, the emphasis is on 
space and time efficient algorithms which achieve a small approximation ratio 
by ideally performing a single scan over the input. A streaming algorithm for 
fc-Median is not restricted in terms of the solution’s structure or the set of op- 
erations available. On the other hand, incremental algorithms must maintain a 
good hierarchical clustering by making irrevocable decisions, but they should 
only run in polynomial time. Nevertheless, all known incremental algorithms for 
clustering problems can be either directly regarded as or easily transformed to 
efficient one-pass streaming algorithms (e.g., [3, 8, 5,4] and this paper). 

Previous Work. Charikar et al. [3] presented incremental algorithms for k- 
Center which achieve a constant performance ratio using k clusters. Charikar 
and Panigrahy [5] presented an incremental algorithm for Sum fc-Radius which 
achieves a constant performance ratio using 0(fc) clusters. They also proved that 
any deterministic incremental algorithm for /c-Median which maintains at most 
k clusters must have a performance ratio of f2(fc). Determining whether there 
exists an incremental algorithm which achieves a constant performance ratio 
using 0(fc) medians is suggested as an open problem in [5]. 

In [8,4], they are presented one-pass streaming algorithms for fc-Median which 
assume that n is known in advance. For fc much smaller than rf, the algorithms 
of Guha et al. [8] achieve a performance ratio of using medians and 

run in 0(nfcpoly(logn)) time and space. The algorithm of Charikar et al. 
[4] achieves a constant performance ratio with high probability (whp.) using 
O(fclog^n) medians and runs in O(nfclog^n) time and O(fclog^n) space. 

The algorithms of [10,6,1] for OFL are the only known incremental algorithms 
for Facility Location. In OFL [10], the demands arrive one at a time and must 
be irrevocably assigned to either an existing or a new facility upon arrival. In 
[10], a randomized O ( ip'g [^g „ ) -competitive algorithm and a lower bound of w(l) 

are presented. In [6], the lower bound is improved to ^( io'^iog„ ) £^nd a determin- 
istic O ( j Jg „ )-competitive algorithm is given. In [1], it is presented a simpler 
and faster deterministic 0(2^* log n) -competitive algorithm for d-dimensional Eu- 
clidean spaces. 

The lower bounds of [10,6] hold only if the decision of opening a facility at 
a particular location is irrevocable. Hence, they do not apply to Incremental 
Facility Location. However, the lower bound of [6] implies that every algorithm 
which maintains o(fc log n) facilities must incur a total initial assignment cost of 
w(l) times the optimal cost, where the initial assignment cost of a demand is its 
distance from the first facility the demand is assigned to. Therefore, every algo- 
rithm treating merge as a black-box operation cannot approximate the optimal 
assignment cost within a constant factor unless it uses n(fclogn) facilities (e.g., 
the algorithm of [4]). In other words, to establish a constant performance ratio, 
one needs a merge rule which can provably decrease both the algorithm’s facility 
and assignment costs. 

Contribution. We present the first incremental algorithm for metric Facility 
Location with uniform facility costs which achieves a constant performance ra- 
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tio. The algorithm combines a simple rule for opening new facilities with a novel 
merge rule based on distance instead of cost considerations. We use a new tech- 
nique to prove that a case similar to the special case that the optimal solution 
consists of a single facility is the dominating case in the analysis. This technique 
is also implicit in [6] and may find applications to other online problems. To 
overcome the limitation imposed by the lower bound of [6] , we establish that in 
the dominating case, merge operations also decrease the assignment cost. 

Using the algorithm for Facility Location as a building block, we obtain 
the first incremental algorithm for fc-Median which achieves a constant perfor- 
mance ratio using 0(fc) medians, thus resolving the open question of [5]. In other 
words, we improve the number of medians required for a constant performance 
ratio from 0(/clog^u) to 0{k). Combining our techniques with the techniques 
of [4] , we obtain a randomized incremental algorithm which achieves a constant 
performance ratio whp. using 0{k) medians and runs in Oink‘d log^ n) time and 
0{k'^ log^ n) space. This algorithm can also be regarded as a time and space 
efficient one-pass streaming algorithm for fc-Median. 

The proofs and the technical details omitted from this extended abstract due 
to space constraints can be found in the full version of this paper [7] . 
Notation. We only consider unit demands and allow multiple demands to be 
located at the same point. For Incremental Facility Location, we restrict our 
attention to the special case of uniform facility costs, where the cost of opening 
a facility, denoted by /, is the same for all points. We also use the terms facility, 
median and cluster interchangeably. 

A metric space M. = (M, d) is usually identified by its point set M. The dis- 
tance function d is non-negative, symmetric, and satisfies the triangle inequality. 
For points u,v £ M and a subspace M' C M, d{u, v) denotes the distance be- 
tween u and V, D{M') denotes the diameter of M', and d{M',u) denotes the 
distance between u and the nearest point in M' . It is d(0, u) = oo. For a point 
u £ M and a non-negative number r, Ball(u, r) = {v £ M : d(u, v) < r}. 

2 An Incremental Algorithm for Facility Location 

The algorithm Incremental Facility Location - IFL (Fig. 1) maintains its facility 
configuration F, its merge configuration consisting of a merge ball Baltic, m{w)) 
for each facility w £ F, and the set L of unsatisfied demands. 

A demand becomes unsatisfied, i.e., is added to the set L, upon arrival. Each 
unsatisfied demand holds a potential always equal to its distance to the nearest 
existing facility. If the unsatisfied neighborhood of a new demand u has 
accumulated a potential of f3f, a new facility located at the same point with 
u opens. Then, the demands in Bu become satisfied and lose their potential. 
Hence, the notion of unsatisfied demands ensures that each demand contributes 
to the facility cost at most once. 

Each facility w £ F maintains the set Init(r<;) of the demands initially as- 
signed to w (namely, the demands assigned to w upon arrival and not after 
a merge operation), its merge radius m(w) and the corresponding merge ball 
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Let X, j3, and tp be constants. 

For each new demand u: 

I/U{u}; Tu d{F, u) / x; 

B„ •<— Ball(u, Cu) n L; 

Pot(B„) = d{F,v)- 

if Pot(Bu) > Pf then 

Let w' be the location of u; 

open(w'); L ■(— L\ Bu, 
for each w G F \ {w'} do 
if d(w,w') < m{w) then 
merge(w — >■ w'); 

Let w be the facility in F closest to u; 
update_merge_radius(m(ui)); 
initiaLassignment(u, tc); 



open(m') 

F F U {u)'}; Init(w') ■4— 0; 

C(ui') 4- 0; 4- 3r„; 

merge(in w') 

F^F\{w}-, 
c{w') G- c{w')uc{wy, 

update_merge_radius(m(w)) 
m 2 (in) = max{r : 

|Ball(w, n (Init(m) U {u})| • r < /3/}; 

m{w) = min{mi(w), m 2 (w)}; 
initiaLassignment(u, w) 

Init(M;) 4— Init(w) U {u}; 

C{w) 4— C{w) U {it}; 



Fig. 1. The algorithm Incremental Facility Location - IFL . 

Ball(w;, m(w;)). w is the only facility in its merge ball. In particular, when w 
opens, the merge radius of w is initialized to a fraction of the distance between 
w and the nearest existing facility. Then, if a new facility w' opens in w's merge 
ball, w is merged with w' . The algorithm keeps decreasing m{w) to ensure that no 
merge operation can dramatically increase the assignment cost of the demands 
in Init(w;). More specifically, it maintains the invariant that 

|Init(it;) n Ball(w;, I • m(it;) < /3/ (1) 

After the algorithm has updated its configuration, it assigns the new demand 
to the nearest facility. We always distinguish between the arrival and the assign- 
ment time of a demand because the algorithm’s configuration may have changed 
in between. 

If the demands considered by IFL occupy m different locations, IFL can be 
implemented in 0(nTO|Fmax|) time and 0(min|n, m|Fmax|}) space, where |Fi„ax| 
is the maximum number of facilities in F at any point in time. The remaining 
of this section is devoted to the proof of the following theorem. 

Theorem 1. For every x > 18, /3 > and ip G [4,5], IFL aehieves a 

constant performance ratio for Facility Location with uniform facility costs. 

Preliminaries. For an arbitrary fixed sequence of demands, we compare the 
algorithm’s cost with the cost of the optimal facility configuration, denoted by 
F* . We reserve the term (optimal) center for the optimal facilities in F* and 
the term facility for the algorithm’s facilities in F. In the optimal solution, each 
demand u is assigned to the nearest center in F*, denoted by Cu- We use the 
clustering induced by F* to map the demands and the algorithm’s facilities to 
the optimal centers. In particular, a demand u is always mapped to c„. Similarly, 
a facility w is mapped to the nearest optimal center, denoted by c^,. Also, let 
d* = d{cu,u) be the optimal assignment cost of u, let Fac* = \F*\f be the 
optimal facility cost, and let Asg* = J2u the optimal assignment cost. 
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To refer to the algorithm’s configuration at the moment a demand is con- 
sidered (resp. facility opens), we use the demand’s (resp. facility’s) identifier 
as a subscript. We distinguish between the algorithm’s configuration at the de- 
mand’s arrival and assignment time using plain symbols to refer to the former 
and primed symbols to refer to the latter time. For example, for a demand m, 
Fu (resp. F!))) is the facility configuration at it’s arrival (resp. assignment) time. 
Similarly, for a facility w, (resp. F^) is the facility configuration just before 
(resp. after) w opens. 

For each facility w, we let Byj = Ball(w, d(Fy,, w)/a;) fl denote the un- 
satisfied neighborhood contributing its potential towards opening w (i.e., if u is 
the demand opening w, then = By). When an existing facility w is merged 
with a new facility w' , the existing facility w is closed and the demands cur- 
rently assigned to w are re-assigned to the new facility w' (and not the other 
way around). 

Basic Properties. The following lemmas establish the basic properties of IFF. 

Lemma 1. For every facility w mapped to Cy,, d{cyj,w) < d{Fyj,Cyj)/3. 

Proof Sketch. Each new facility is opened by a small unsatisfied neighborhood 
with a large potential. If there were no optimal centers close to such a neighbor- 
hood, the cost of the optimal solution could decrease by opening a new center. 

For a formal proof, we assume that there is a facility w such that d(cy,, w) > 
d{Fy,, Cyj)/3. Since Byj C Ball(r(;, d{Fyj,w)/x), we can show that for each u G By,, 
d{u,w) < and d{Fy,,u) < Using Pot(Bu,) > /?/, we conclude 

that for (3 appropriately large, / -I- d{u,w) < dtn which contra- 
dicts to the optimality of F*. □ 

Lemma 2. For every facility w, there will always exist a facility in 
Ball{w , m{w)) and every demand currently assigned to w will remain as- 

signed to a facility in Baltic, m{w)). 

Proof Sketch. The lemma holds because every time a facility w is merged with 
a new facility w', the merge radius of w' is a fraction of the merge radius of w. 
Formally, if w is merged with a new facility w', it must be d{w, w') < m{w) and 
m{w') < ^d{w,w'). Hence, Ball(u>', ^^m{w')) C Ball(w, □ 

Facility Cost. We distinguish between supported and unsupported facilities: A 
facility w is supported if Asg*{By,) = J^usBy, dy > and unsupported other- 
wise. The cost of each supported facility w can be directly charged to the optimal 
assignment cost of the demands in By,. Since each demand contributes to the 
facility cost at most once, the total cost of supported facilities is at most ^Asg*. 
On the other hand, the merge ball of each unsupported facility w mapped to 
the optimal center Cy, includes a sufficiently large area around Cy,. Thus, each 
unsupported facility w mapped to Cy, is merged with the first new facility w' 
also mapped to Cy, because w' is included in w’s merge ball. Consequently, there 
can be at most one unsupported facility mapped to each optimal center. Hence, 
the total cost of unsupported facilities never exceeds Fac* and the algorithm’s 
facility cost is at most Fac* -I- ^ Asg*. 
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Lemma 3. Let w be an unsupported facility mapped to an optimal center Cw, 
and let w' be a new facility also mapped to Cw If w' opens while w is still open, 
then w is merged with w' . 

Proof Sketch. By Lemma 1, it must be d{cw,w') < d{Fyji , Cw) / 'i < d{cw,w)/3, 
because w' is mapped to and w € Fyji by hypothesis. We can also prove that 
m{w) > 3d{cuj,w)/2. This implies the lemma because d{w,w') < d{cw,w) + 
d{cw,w') < 4d{Cw,w)/3 < m{w) and w must be merged with w' . □ 

Assignment Cost. We start with a sketch of the analysis in the special case 
that F* consists of a single optimal center, denoted by c. The same ideas can be 
applied to the case that some optimal centers are very close to each other and 
isolated from the remaining centers in F* (cf. isolated active coalitions). 

Let w be the facility which is currently the nearest facility to c. By Lemma 1, 
w stops being the nearest facility to c as soon as a new facility w' opens. By 
Lemma 3, if w is an unsupported facility, it is merged with w' . The main idea 
is that as long as the algorithm keeps merging existing facilities with new ones 
(which are significantly closer to c), the algorithm’s facility configuration con- 
verges rapidly to c and the algorithm’s assignment cost converges to the optimal 
assignment cost. In addition, the rule for opening new facilities ensures that the 
algorithm’s assignment cost for the demands arriving after w’s and before w'’s 
opening remains within a constant factor from the optimal cost. A simple induc- 
tion yields that the assignment cost for the demands arriving as long as existing 
facilities are merged with new ones (cf. good inner demands) remains within a 
constant factor from the optimal cost. 

However, if w is a supported facility, it may not be merged with the new 
facility w'. Then, the algorithm is charged with an irrevocable cost because the 
chain of merge operations converging to the optimal center has come to an end. 
This cost accounts for the assignment cost of the demands whose facility has 
stopped converging to c (cf. bad inner demands). This cost is charged to the 
optimal assignment cost of the demands in Hu,, which is at least -^f since w is 
a supported facility. 

For the analysis of non-isolated sets of optimal centers (cf. non-isolated active 
coalitions), it is crucial that Lemma 2 implies a notion of distance (cf. config- 
uration distance) according to which the algorithm’s configuration converges 
monotonically to each point of the metric space. The analysis is based on the 
fact that a set of optimal centers can be non-isolated only if its distance to the 
algorithm’s configuration lies in a relatively short interval. This implies that the 
case dominating the analysis and determining the algorithm’s performance ratio 
is that of isolated sets of optimal centers. 

We proceed to formally define the basic notions used in the analysis of the 
algorithm’s assignment cost. In the following, A = 3a; -I- 2, p = {tp 2){X 2), 

and 7 = 12p are appropriately chosen constants. 

Configuration Distance. For an optimal center c € F* and a facility w € F, 
the configuration distance between c and w, denoted by g{c,w), is g{c,w) = 
d{c,w) m{w). For an optimal center c € F* , the configuration distance of 
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c, denoted by g(c), is g{c) = mini„gi?{g(c, w)} = mini„gi?{c?(c, w) + m{w)}. 

The configuration distance g{c) is non-increasing with time and there always 
exists a facility within a distance of g{c) from c (Lemma 2). 

Coalitions. A set of optimal centers K ^ F* with representative ck & K forms 
a coalition as long as g{cK) > pD{K). A coalition K becomes broken as soon as 
9{ck) < pD{K). A coalition K is isolated if g(cK) < sep(AT)/3 and non-isolated 
otherwise, where sep(AT) = d{K, F* \ K) denotes the distance separating the 
optimal centers in K from the remaining optimal centers. Intuitively, as long 
as K's diameter is much smaller than g{cfc) {K is a coalition), the algorithm 
behaves as if K was a single optimal center. If the algorithm is bound to have a 
facility which is significantly closer to K than any optimal center not in K {K is 
isolated), then as far as K is concerned, the algorithm behaves as if there were 
no optimal centers outside K. 

A hierarchical decomposition 1C of F* is a complete laminar set system^ on F* . 
Every hierarchical decomposition of F* contains at most 2 |F*| — 1 sets. Given a 
hierarchical decomposition 1C of F*, we can fix an arbitrary representative ck for 
each K € 1C and regard /C as a system of coalitions which hierarchically cover F* . 
Formally, given a hierarchical decomposition /C of F* and the current algorithm’s 
configuration, a set K £ K. is an active coalition if K is still a coalition (i.e., 
g{cK) > pD{K)), while every superset of AT in /C has become broken (i.e., for 
every K' £ 1C, K C K' , g{cK') < pD{K')). The current algorithm’s configuration 
induces a collection of active coalitions which form a partitioning of F* . Since 
g{cK) is non-increasing, no coalition which has become broken (resp. isolated) 
can become active (resp. non-isolated) again. 

Let Dm{K) = max{D(AT), Asep(iC)}. By definition, K becomes either iso- 
lated or broken as soon as g{cx) < pDjq{K). Using [6, Lemma 1], we can show 
that there is a hierarchical decomposition of F* such that no coalition K becomes 
active before g{cK) < {p+l)"i^DM{K). In the following, we assume that the set of 
active coalitions is given by a fixed hierarchical decomposition K. of F* such that 
for every non-isolated active coalition K, pDn{K) < g{cK) < {p+ l) 7 ^A^Af(Ar). 

Each new demand u is mapped to the unique active coalition containing c„ 
when u arrives. Each new facility w is mapped to the unique active coalition 
containing just before w opens. For an isolated active coalition K, we let wk 
denote the nearest facility to AT’s representative ck at any given point in time. 
In other words, wk is a function always associating the isolated active coalition 
K with the facility in F which is currently the nearest facility to ck- 

Final Assignment Cost. Let u be a demand currently assigned to a facility w with 
merge radius m(w). The final assignment cost of u, denoted by du, is equal to 
min{d(tt, w)+^rz^ m{w), (1+y) max{d(c*r, w), XD{K)}+dl,} if u is mapped to an 
isolated active coalition K and w is currently the nearest facility to Ck , and equal 
to d{u, w) + m{w) otherwise. If u is currently assigned to a facility w, it will 

^ A set system is laminar if it contains no intersecting pair of sets. The sets K, K' 
form an intersecting pair if neither of K \ K' , K' \ K and K C\ K' are empty. A 
laminar set system on F* is complete if it contains F* and every singleton set {c}, 
cGF*. 
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remain assigned to a facility in Ball(r/;, m{w)) (Lemma 2). We can also prove 
that if u is mapped to an isolated active coalition K and is currently assigned to 
wk, then u’s assignment cost will never exceed (1+y ) max{d(c/y, wk), XD{K)}+ 
d* . Therefore, the final assignment cost of u according to the current algorithm’s 
configuration is an upper bound on its actual assignment cost in the future. 

Inner and Outer Demands. A demand u mapped to a non-isolated active coali- 
tion K is inner if d* < D]\[(K), and outer otherwise. A demand u mapped to 
an isolated active coalition K is inner if d* < max{d(ciy, u>^), AZ3(iL)}, and 
outer otherwise. In this definition, w'j^ denotes the nearest facility to ck at u’s 
assignment time. 

We can prove that the final assignment cost of each outer demand is within 
a constant factor from its optimal assignment cost. From now on, we entirely 
focus on bounding the total assignment cost of inner demands throughout the 
execution of the algorithm. 

Good and Bad Inner Demands. At any point in time, the set of good demands 
of an isolated active coalition K, denoted by Gk, consists of the inner demands 
of K which have always (i.e., from their assignment time until the present time) 
been assigned to wk (i-e., the nearest facility to Ck). Gk is empty as long as K 
is either not active or non-isolated. We call had every inner demand of K which 
is not good. 

Each new inner demand mapped to an isolated active coalition K is initially 
assigned to the nearest facility to ck because this facility is much closer to ck 
than any other facility (see also Lemma 1) and inner demands are included in 
a small ball around Ck. Hence, each new inner demand mapped to K becomes 
good. An inner demand remains good as long as its assignment cost converges 
to its optimal assignment cost. In particular, a good inner demand becomes bad 
as soon as either K becomes broken or the location of the nearest facility to 
Ck changes and the facility at the former location wk is not merged with the 
facility at the new location w'j^. A bad demand can never become good again. 

With the exception of good demands, each demand is irrevocably charged 
with its final assignment cost at its assignment time. We keep track of the actual 
assignment cost of good demands until they become bad. This is possible because 
the good demands of an isolated active coalition K are always assigned to the 
nearest facility to ck. Good demands are irrevocably charged with their final 
assignment cost at the moment they become bad. 

Isolated Goalitions. We use the following lemma to bound the assignment cost 
of the inner demands mapped to an isolated active coalition K. 

Lemma 4. Let K be an isolated active coalition. The total actual and the total 
final assignment cost of the good inner demands of K is: 

< W + 3E«gGk< E«gGk < 4-5/3/ + 7EugGk < 

Proof Sketch. We sketch the proof of the first inequality. The second inequality 
is derived from the first one using the definition of the final assignment cost. 

The proof is by induction over a sequence of merge operations where the 
former nearest facility to ck is merged with the new nearest facility to ck. Let 
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w be the nearest facility to ck, i-e., wk = w. We can show that for any isolated 
coalition K, the location of the nearest facility to ck cannot change until a new 
facility mapped to K opens. Let w' be the next facility mapped to K and let 
Gk be the set of good demands just before w' opens. By Lemma 1, the location 
of the nearest facility to ck changes from wk = w to w'j^ = w' as soon as w' 
opens. We inductively assume that the inequality holds just before w' opens, 
and show that it remains valid until either the location of the nearest facility to 
Ck changes again or K becomes broken. 

If w is not merged with w', the set of good demands becomes empty and the 
inequality holds trivially. If w is merged with w', we can apply Lemma 1 and 
show that for every u € Gk, d{u,w') < ^d{u,w) + |d*. Just after w has been 
merged with w', the set of good demands remains Gk, but each good demand 
is now assigned to w' . Hence, we can apply the inductive hypothesis and the 
inequality above and prove that J2u£Gk d{u,w') < Pf + 3J2ueGK^u- 

As long as w' remains the nearest facility to ck and K remains an isolated 
active coalition, each new inner demand mapped to K is initially assigned to w' 
and becomes a good demand. The rule for opening new facilities and Ineq. (1) 
imply that the initial assignment cost of these demands is at most Pf. □ 

As long as K remains an isolated active coalition, it holds a credit of 0(/3/) 
which absorbs the additive term of 2/3/ corresponding to the part of the actual 
assignment cost of good inner demands which cannot be charged to their optimal 
assignment cost. The good inner demands of K are charged with their final 
assignment cost as soon as they become bad and Gk becomes empty. If Gk 
becomes empty because K becomes broken, the additive term of 4.5/3/ in the 
final assignment cost of the demands becoming bad is charged to K’s credit. 
Otherwise, Gk becomes empty because the location of the nearest facility to 
Ck has changed and the facility w at the previous location wk is not merged 
with the new facility w' at the new location w'p^. By Lemma 3, the facility w 
must be a supported facility. Hence, the additive term of 4.5/3/ can be charged 
to the optimal assignment cost of the demands in 33^,, since 3x Asg* {B^) > Pf. 
Lemma 1 implies that each supported facility is charged with the final assignment 
cost of some good demands which become bad at most once. 

Non-isolated Coalitions. The inner demands mapped to non-isolated active coali- 
tions are irrevocably charged with their final assignment cost at their assignment 
time. To bound their total final assignment cost, we develop a standard potential 
function argument based on the notion of unsatisfied inner demands. The set of 
unsatisfied inner demands of a non-isolated active coalition K, denoted by Nk, 
consists of the inner demands of K which are currently unsatisfied, i.e., in the 
set L. Nk is empty as long as K is either isolated or not active. 

Lemma 5. For every non-isolated active eoalition K, \Nk\ • 9{ck) < (V' + 

^h^Pf- 

As long as K remains a non-isolated active coalition, ck has a deficit of 
5 \Nk\- g{cK), which, by Lemma 5, never exceeds 5(?/)-|-4)7^/3/. If a new demand 
u is an inner demand of K and remains unsatisfied at its assignment time, we 
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can prove that its final assignment cost is at most b g'^icK) (we recall that g'^icK) 
denotes the configuration distance of ck at u’s assignment time). Since u is added 
to Nk, the increase in the deficit of ck compensates for the final assignment cost 
of u. 

The deficit of ck is 0 before K becomes active and after K has become 
either isolated or broken. Hence, the total final assignment cost of the inner 
demands mapped to K while K remains a non-isolated active coalition does 
not exceed the total decrease in the deficit of ck- Each time the configuration 
distance g^Cx) decreases but the set of unsatisfied inner demands Nx remains 
unaffected, the deficit of cx decreases by a factor of ln(p^^). We can also prove 
that each time opening a new facility causes some demands to become satisfied 
and be removed from Nx, g(cx) decreases by a factor of 3. Thus, the deficit 
of Cx again decreases by a factor of In(p^)j^). A non-isolated active coalition 
K becomes either isolated or broken no later than g{cx) has decreased by a 
factor of (1 -I- ^)7^. Therefore, the total decrease in the deficit of cx is at most 
0(5(V' + 4)7^/31n((l -|- ^)7^)/). We increase this cost by 0(/31n((l -|- ^)7^)/) to 
take into account the final assignment cost of the inner demands of K which 
open a new facility and do not remain unsatisfied at their assignment time. 

Concluding the Proof of Theorem 1. Let ui,...,u„ be the demand se- 
quence considered by IFL. Putting everything together, we conclude that after 
the demand Uj has been considered, the facility cost of IFL does not exceed 
aiFac*-|-6iAsg* and the assignment cost of IFL does not exceed 02Fac*-|-52Asg*, 

where Asg* = X)i=i d-ta ^ oi = 1, 02 = 2/31n(37^)(5(V'+4)7^-|-3), 61 = 3x/(3, 
and 62 = 4((p -|- 1)7^ -|- 2) -|- 14x. With a more careful analysis, we can improve 

02 to 4/31og(7)(12(?/; -|- 2) -|- 3) and &2 to (A -I- 2)(8V' + 25). 

3 An Incremental Algorithm for fc-Median 

The incremental algorithm for /c-Median is based on the following standard 
lemma. We recall that oi, 02, 61, and 62 denote the constants in the performance 
ratio of IFL. 

Lemma 6. Let Asg* be the optimal assignment east for an instance of k- 
Median. For any A > 0, IFL with facility cost f = maintains a solu- 

tion of assignment cost no greater than (02 -I- &2)Asg* -I- &2A with no more than 
(oi -I- 02 ^^7^-)^ medians. 

The deterministic version of the incremental algorithm operates in phases 
using IFL as a building block. Phase i is characterized by an upper bound A^ on 
the optimal assignment cost for the demands considered in the current phase. 
The algorithm invokes IFL with facility cost fi = By Lemma 6, as soon 

as either the number of medians exceeds vk or the assignment cost exceeds , 
where v, p are appropriately chosen constants, we can be sure that the optimal 
cost has also exceeded A^. Then, the algorithm merges the medians produced by 
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the current phase with the medians produced by the previous phases, increases 
the upper bound by a constant factor, and proceeds with the next phase. 

Theorem 2. There exists a deterministic incremental algorithm for k-Median 
which achieves a constant performance ratio using 0(fc) medians. 

The randomized version also proceeds in phases. Phase i invokes the algo- 
rithm Para.Cluster [4] with facility cost fi = to generate a modified 

instance which can be represented in a space efficient manner. The modified in- 
stance contains the same number of demands, which now occupy only 0(fc log^ n) 
different gathering points. Then, the algorithm invokes IFL with facility cost 
fi = — ^ to cluster the modified instance. 

For an incremental implementation, each new demand is first moved to a 
gathering point. Then, a new demand located at the corresponding gathering 
point is given to IFL, which assigns it to a median. Both actions are completed 
before the next demand is considered. The current phase ends if either the num- 
ber of gathering points, the gathering cost, the number of medians, or the as- 
signment cost on the modified instance become too large. We should emphasize 
that IFL treats the demands moved to the same gathering point as different 
demands and may put them in different clusters. 

In contrast to the deterministic version which does not require any advance 
knowledge of n, the randomized version needs to know logn in advance. 

Theorem 3. There exists a randomized incremental algorithm for k-Median 
which runs in Oink‘d log^ n) time and 0{k^ log^ n) space and achieves a constant 
performance ratio whp. using 0(fc) medians. 
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Abstract. We present a new algorithm for dynamic prefix- free coding, 
based on Shannon coding. We give a simple analysis and prove a better 
upper bound on the length of the encoding produced than the corre- 
sponding bound for dynamic Huffman coding. We show how our algo- 
rithm can be modified for efficient length-restricted coding, alphabetic 
coding and coding with unequal letter costs. 

1 Introduction 

Prefix-free coding is a well-studied problem in data compression and combina- 
torial optimization. For this problem, we are given a string S = Si ■ ■ ■ Sm drawn 
from an alphabet of size n and must encode each character by a self-delimiting 
binary codeword. Our goal is to minimize the length of the entire encoding of S. 
For static prefix-free coding, we are given all of S before we start encoding and 
must encode every occurrence of the same character by the same codeword. The 
assignment of codewords to characters is recorded as a preface to the encoding. 
For dynamic prefix-free coding, we are given S character by character and must 
encode each character before receiving the next one. We can use a different code- 
word for different occurrences of the same character, we do not need a preface 
to the encoding and the assignment of codewords to characters cannot depend 
on the suffix of S not yet encoded. 

The best-known algorithms for static coding are by Shannon [15] and Huff- 
man [8]. Shannon’s algorithm uses at most {H -|- l)m + 0{nlogn) bits to encode 



is the empirical entropy of S and =ffa{S) is the number of occurrences of the 
character a in S. By log we mean log 2 . Shannon proved a lower bound of Hm 
bits for all coding algorithms, whether or not they are prefix-free. Huffman’s al- 
gorithm produces an encoding that, excluding the preface, has minimum length. 
The total length is {H + r)m + O(nlogn) bits, where 0 < r < 1 is a function of 
the character frequencies in S [3]. 

Both algorithms assign codewords to characters by constructing a code-tree, 
that is, a binary tree whose left and right edges are labelled by O’s and I’s, 
respectively, and whose leaves are labelled by the distinct characters in S. The 



S, where 
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codeword assigned to a character a in S' is the sequence of edge labels on the 
path from the root to the leaf labelled a. Shannon’s algorithm builds a code-tree 
in which, for a g S, the leaf labelled a is of depth at most |"log(m/#a(S))] . 
Huffman’s algorithm builds a Huffman tree for the frequencies of the characters 
in S. A Huffman tree for a sequence of weights wi, . . . ,w„ is a binary tree 
whose leaves, in some order, have weights wi, . . . ,Wn and that, among all such 
trees, minimizes the weighted external path length. To build a Huffman tree for 
wi, . . . ,Wn, we start with n trees, each consisting of just a root. At each step, 
we make the two roots with smallest weights, Wi and Wj, into the children of a 
new root with weight Wi + Wj . 

A minimax tree for a sequence of weights wi, . . . ,Wn is a binary tree whose 
leaves, in some order, have weights w\, . . . ,Wn and that, among all such trees, 
minimizes the maximum sum of any leaf’s weight and depth. Golumbic [7] gave 
an algorithm, similar to Huffman’s, for constructing a minimax tree. The differ- 
ence is that, when we make the two roots with smallest weights, Wi and Wj, into 
the children of a new root, that new root has weight Taax(wi,Wj) + 1 instead 
of Wi + Wj. Notice that, if there exists a binary tree whose leaves, in some or- 
der, have depths di, . . . , d„, then a minimax tree T for — di, . . . , — d„ is such a 
tree and, more generally, the depth of each node in T is bounded above by the 
negative of its weight. So we can construct a code-tree for Shannon’s algorithm 
by running Golumbic’s algorithm, starting with roots labelled by the distinct 
characters in S, with the root labelled a having weight — |’log(m/#a(5))] . 

Both Shannon’s algorithm and Huffman’s algorithm have three phases: a first 
pass over S to count the occurrences of each distinct character, an assignment of 
codewords to the distinct characters in S (recorded as a preface to the encoding) 
and a second pass over S to encode each character in S using the assigned 
codeword. The first phase takes 0{m) time, the second 0(n log n) time and the 
third 0{{H + l)m) time. 

For any static algorithm A, there is a simple dynamic algorithm that recom- 
putes the code-tree from scratch after reading each character. Specifically, for 
i = 1 . . . m: 

1. We keep a running count of the number of occurrences of each distinct char- 
acter in the current prefix si • • • Si_i of S. 

2. We compute the assignment of codewords to characters that would result 
from applying A to Tsi • • • Si_i, where T is a special character not in the 
alphabet. 

3. If Si occurs in Si • • • Si_i, then we encode Si as the codeword Ci assigned to 
that character. 

4. If Si does not occur in si • • • Si_i, then we encode Si as the concatenation Ci 
of the codeword assigned to T and the binary representation of sfs index in 
the alphabet. 

We can later decode character by character. That is, we can recover si • • • s^ as 
soon as we have received Ci • • • . To see why, assume that we have recovered 

Si---Si_i. Then we can compute the assignment of codewords to characters 
that A used to encode Si. Since A is prefix-free, Ci is the only codeword in this 
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assignment that is a prefix of • • • Cm- Thus, we can recover Si as soon as Ci has 
been received. This takes the same amount of time as encoding Si. 

Taller [4] and Gallager [6] independently gave a dynamic coding algorithm 
based on Huffman’s algorithm. Their algorithm is similar to, but much faster 
than, the simple dynamic algorithm obtained by adapting Huffman’s algorithm 
as described above. After encoding each character of S, their algorithm merely 
updates the Huffman tree rather than rebuilding it from scratch. Knuth [10] 
implemented their algorithm so that it uses time proportional to the length of 
the encoding produced. For this reason, it is sometimes known as Faller-Gallager- 
Knuth coding; however, it is most often called dynamic Huffman coding. Milidiu, 
Laber, and Pessoa [14] showed that this version of dynamic Huffman coding uses 
fewer than 2m more bits to encode S than Huffman’s algorithm. Vitter [17] gave 
an improved version that he showed uses fewer than m more bits than Huffman’s 
algorithm. These results imply Knuth’s and Vitter’s versions use at most {H + 
2 + r)m + 0(n log n) and {H +l + r)m+0{nlogn) bits to encode S, but it is not 
clear whether these bounds are tight. Both algorithms use 0{{H + l)m) time. 

In this paper, we present a new dynamic algorithm, dynamic Shannon coding. 
In Section 2, we show that the simple dynamic algorithm obtained by adapting 
Shannon’s algorithm as described above, uses at most {H + l)m + 0(n log m) 
bits and O(mnlogn) time to encode S. Section 3 contains our main result, an 
improved version of dynamic Shannon coding that uses at most (H + l)m + 
0(n log m) bits to encode S and only 0{{H + l)m + nlog^m) time. The re- 
lationship between Shannon’s algorithm and this algorithm is similar to that 
between Huffman’s algorithm and dynamic Huffman coding, but our algorithm 
is much simpler to analyze than dynamic Huffman coding. 

In Section 4, we show that dynamic Shannon coding can be applied to three 
related problems. We give algorithms for dynamic length-restricted coding, dy- 
namic alphabetic coding and dynamic coding with unequal letter costs. Our 
algorithms have better bounds on the length of the encoding produced than 
were previously known. For length-restricted coding, no codeword can exceed a 
given length. For alphabetic coding, the lexicographic order of the codewords 
must be the same as that of the characters. 

Throughout, we make the common simplifying assumption that m> n. Our 
model of computation is the unit-cost word RAM with 12(logm)-bit words. In 
this model, ignoring space required for the input and output, all the algorithms 
mentioned in this paper use 0(|{a : a € S'}]) words, that is, space proportional 
to the number of distinct characters in S. 



2 Analysis of Simple Dynamic Shannon Coding 

In this section, we analyze the simple dynamic algorithm obtained by repeat- 
ing Shannon’s algorithm after each character of the string de- 

scribed in the introduction. Since the second phase of Shannon’s algorithm, 
assigning codewords to characters, takes O(nlogn) time, this simple algorithm 
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uses O(mnlogn) time to encode S. The rest of this section shows this algorithm 
uses at most {H + l)m + O(nlogm) bits to encode S. 

For 1 < i < m and each distinct character a that occurs in 
Shannon’s algorithm on _Lsi • • • Si_i assigns to a a codeword of length at most 
[log(i/#a(-Lsi • • • This fact is key to our analysis. 

Let R be the set of indices i such that Si is a repetition of a character in 
That is, i? = {i : 1 < i < m, s, € {si, . . . , Our analysis 

depends on the following technical lemma. 

Lemma 1. 

log f — — ; — r ) < Hm + 0(nlogm) . 

itR 

Proof. Let 

Notice that X^ig^^logi < X^^ilogi = log(m!). Also, for i G R, if Si is the jth 
occurrence of a in S, for some j > 2, then log#s-(si • • • Si_i) = log(j— 1). Thus, 

L = '^logi-'^log if Si (si • • • s*-i) 
iefl i&R 

#AS) 

< log(m!) log(j - 1) 

aGS i=2 

= log(m!) - y]log(#a(S')!) + y] log #a(S') . 

o G 5 (i^S 

There are at most n distinct characters in S and each occurs at most m times, 
so ^ 0{n\ogm). By Stirling’s Formula, 

X log a; — X In 2 < log(x!) < x log x — x In 2 + 0(log x) . 



Thus, 

L < m log m — m In 2 — y] (#a(5) log #„(5) - #a{S) In 2 ) + 0(n log m) . 
aes ^ 

Since J2a€S#o.{S) = m, 

L < y^#a(5')log +0{nlogm) . 



a£S 



#a(S) 



By definition, this is Hm + O(nlogm). □ 

Using Lemma 1, it is easy to bound the number of bits that simple dynamic 
Shannon coding uses to encode S. 
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Theorem 1. Simple dynamic Shannon coding uses at most (H + l)m + 
0(n log m) bits to encode S. 

Proof. If Si is the first occurrence of that character in S (i.e., i S {1, . . . , m} — R), 
then the algorithm encodes Si as the codeword for _L, which is at most [logm] 
bits, followed by the binary representation of sfs index in the alphabet, which 
is [log n~\ bits. Since there are at most n such characters, the algorithm encodes 
them all using 0(n log m) bits. 

Now, consider the remaining characters in S, that is, those characters whose 
indices are in R. In total, the algorithm encodes these using at most 



E 

i&R 



log 



#si(-Lsi 




< m + 



E 

ieR 



log 



ffsi l) 



bits. By Lemma 1, this is at most {H + l)m + 0(n log m). 

Therefore, in total, this algorithm uses at most {H + l)m + O(nlogm) bits 
to encode S. □ 



3 Dynamic Shannon Coding 

This section explains how to improve simple dynamic Shannon coding so that it 
uses at most {H + l)m + 0(n log m) bits and 0{{H + l)m + nlog^ m) time to 
encode the string S = s\ - ■ ■ Sm- The main ideas for this algorithm are using a 
dynamic minimax tree to store the code-tree, introducing “slack” in the weights 
and using background processing to keep the weights updated. 

Gagie [5] showed that Faller’s, Gallager’s and Knuth’s techniques for making 
Huffman trees dynamic can be used to make minimax trees dynamic. A dynamic 
minimax tree T supports the following operations: 

— given a pointer to a node v, return u’s parent, left child, and right child (if 
they exist); 

— given a pointer to a leaf v, return u’s weight; 

— given a pointer to a leaf v, increment u’s weight; 

— given a pointer to a leaf v, decrement u’s weight; 

— and, given a pointer to a leaf v, insert a new leaf with the same weight as v. 

In Gagie’s implementation, if the depth of each node is bounded above by the 
negative of its weight, then each operation on a leaf with weight —di takes 0{di) 
time. Next, we will show how to use this data structure for fast dynamic Shannon 
coding. 

We maintain the invariant that, after we encode si • • • Si_i, T has one leaf 
labelled a for each distinct character a in _Lsi • • • Si-i and this leaf has weight be- 
tween -[log((i-kn)/#a(-Lsi • • • s*_i))] and - [log(max(z, n)/#a(_Lsi • • 

Notice that applying Shannon’s algorithm to _Lsi • • • Si_i results in a code- 
tree in which, for a € the leaf labelled a is of depth at most 

[log(i/#o(-Lsi • ■ • . It follows that the depth of each node in T is bounded 

above by the negative of its weight. 
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Notice that, instead of having just i in the numerator, as we would for simple 
dynamic Shannon coding, we have at most i + n. Thus, this algorithm may assign 
slightly longer codewords to some characters. We allow this “slack” so that, after 
we encode each character, we only need to update the weights of at most two 
leaves. In the analysis, we will show that the extra n only affects low-order terms 
in the bound on the length of the encoding. 

After we encode s,, we ensure that T contains one leaf labelled Si and this leaf 
has weight — [log((i -I- 1 -I- n) /#s^ (_Lsi • • • Si))] . First, if Si is the first occurrence 
of that distinct character in S (i.e., i € {1, . . . ,m} — R), then we insert a new 
leaf labelled Si into T with the same weight as the leaf labelled _L. Next, we 
update the weight of the leaf labelled Si. We consider this processing to be in 
the foreground. 

In the background, we use a queue to cycle through the distinct charac- 
ters that have occurred in the current prefix. For each character that we en- 
code in the foreground, we process one character in the background. When we 
dequeue a character a, if we have encoded precisely si • • • Si, then we update 
the weight of the leaf labelled a to be — |"log((i -I- 1 -I- n)/#a(-Lsi ■ • • s^))], un- 
less it has this weight already. Since there are always at most n + 1 distinct 
characters in the current prefix (_L and the n characters in the alphabet), this 
maintains the following invariant: For 1 < i < m and a € im- 
mediately after we encode the leaf labelled a has weight between 

-|'log((i -k n)/#a(-Lsi • • • Sj_i))] and - |■log(max(^, n)/#a(-Lsi • • • . No- 

tice that max(i, n) < i + n <2 max(z, n) and #a(si • • • Si_i) < #a(-Lsi • • • Si) < 
2#a(si • • • Si_i) -k 1. Also, if Si is the first occurrence of that distinct character in 
S, then (_Lsi ■ ■ ■ Si) = #j_(_Lsi • • • Si_i). It follows that, whenever we update 
a weight, we use at most one increment or decrement. 

Our analysis of this algorithm is similar to that in Section 2, with two differ- 
ences. First, we show that weakening the bound on codeword lengths does not 
significantly affect the bound on the length of the encoding. Second, we show 
that our algorithm only takes 0{{H+l)m+nlog^ m) time. Our analysis depends 
on the following technical lemma. 

Lemma 2. Suppose I C and |/| > n. Then 



Y^iog 



i + n 
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+ n log(max I + n) . 



Proof. Let 



L = ^ log 
i&I 



i + n 

Xi 




+ ^log 

*e/ 



Let ii, . . . ,i\i\ be the elements of /, with 0 < < • • • < i\i\. Then ij+n < ij+n, 

so 
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< log 




D+") (max / + n)” 




nlog(max/ + n) . 



Therefore, 




+ nlog(max/ + n) . 



□ 



Using Lemmas 1 and 2, it is easy to bound the number of bits and the time 
dynamic Shannon coding uses to encode S, as follows. 

Theorem 2. Dynamic Shannon coding uses at most {H + l)m + 0{nlogm) bits 



Proof. First, we consider the length of the encoding produced. Notice that the 
algorithm encodes S using at most 



bits. By Lemmas 1 and 2, this is at most (// + l)m + O(nlogm). 

Now, we consider how long this algorithm takes. We will prove separate 
bounds on the processing done in the foreground and in the background. 

If Si is the first occurrence of that character in S (i.e., i G {!,..., m} — i?), 
then we perform three operations in the foreground when we encode we 
output the codeword for _L, which is at most |"log(i + n)~\ bits; we output the 
index of Si in the alphabet, which is [log n] bits; and we insert a new leaf 
labelled Si and update its weight to be — [log(i + 1 + n)] . In total, these take 
0(log(* + n)) C O(logm) time. Since there are at most n such characters, the 
algorithm encodes them all using 0(n log to) time. 

For i G R, we perform at most two operations in the foreground when 
we encode sf. we output the codeword for s^, which is of length at most 
[log((i + n)/#s^(si • • • ; and, if necessary, we increment the weight of the 

leaf labelled s^. In total, these take O (log((i + n)/#s-(si • • • Si-i))) time. 

For 1 < i < TO, we perform at most two operations in the background when 
we encode we dequeue a character a; if necessary, decrement the weight of the 
leaf labelled a; and re-enqueue a. These take 0(1) time if we do not decrement 
the weight of the leaf labelled a and 0{logm) time if we do. 

Suppose Si is the first occurrence of that distinct character in S. Then the 
leaf V labelled Si is inserted into T with weight — [log( i -I- n)] . Also, v’s weight 
is never less than — [log(TO -I- 1 -I- n)] . Since decrementing v’s weight from w to 
w — 1 or incrementing v’s weight from re — 1 to re both take 0(—w) time, we 



and 0{{H + 1)to -I- nlog^ to) time. 
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spend the same amount of time decrementing v’s weight in the background as 
we do incrementing it in the foreground, except possibly for the time to decrease 
v’s weight from — [log(z + n)] to — |"log(m + 1 + n)] . Thus, we spend 0(log^ m) 
more time decrementing v’s weight than we do incrementing it. Since there are 
at most n distinct characters in S, in total, this algorithm takes 

time. It follows from Lemmas 1 and 2 that this is 0{{H + l)m + nlog^ m). □ 



4 Variations on Dynamic Shannon Coding 

In this section, we show how to implement efficiently variations of dynamic 
Shannon coding for dynamic length-restricted coding, dynamic alphabetic cod- 
ing and dynamic coding with unequal letter costs. Abrahams [1] surveys static 
algorithms for these and similar problems, but there has been relatively little 
work on dynamic algorithms for these problems. 

We use dynamic minimax trees for length-restricted dynamic Shannon cod- 
ing. For alphabetic dynamic Shannon coding, we dynamize Melhorn’s version of 
Shannon’s algorithm. For dynamic Shannon coding with unequal letter costs, we 
dynamize Krause’s version. 



4.1 Length-Restricted Dynamic Shannon Coding 

For length-restricted coding, we are given a bound and cannot use a codeword 
whose length exceeds this bound. Length-restricted coding is useful, for example, 
for ensuring that each codeword fits in one machine word. Liddell and Moffat [12] 
gave a length-restricted dynamic coding algorithm that works well in practice, 
but it is quite complicated and they did not prove bounds on the length of 
the encoding it produces. We show how to length-restrict dynamic Shannon 
coding without significantly increasing the bound on the length of the encoding 
produced. 



Theorem 3. For any fixed integer £ > 1, dynamic Shannon coding can he 
adapted so that it uses at most 2[logn] + i bits to encode the first occurrence of 
each distinct character in S, at most [log n~\ + t bits to encode each remaining 

character in S, at most (^H -f 1 -I- ( 2 ^- 1 ) in 2 ) ™ O(’^logm) bits in total, and 
0{{H + l)m + nlog^ m) time. 



Proof. We modify the algorithm presented in Section 3 by removing the leaf 
labelled _L after all of the characters in the alphabet have occurred in S, and 
changing how we calculate weights for the dynamic minimax tree. Whenever we 
would use a weight of the form — [log x] , we smooth it by instead using 



log 



(2^ — l)/x + 1/r 



> — min 



log 



2‘^x 
2 ^ - 1 



, [log n] -k £ 
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With these modifications, no leaf in the minimax tree is ever of depth greater 
than [log n] + Since 



log 



2^a; 

2^-1 



< log X + 1 + 



log (l + 



2'-l 



2 ^ - 1 



< log X + 1 + 



{2^ - l)ln2 



essentially the same analysis as for Theorem 2 shows this algorithm uses at most 
{^H + 1 + ( 2 ^-i)in 2 ) ™ ■*" Oinlogm) bits in total, and 0{{H + l)m + nlog^ m) 
time. □ 



It is straightforward to prove a similar theorem in which the number of bits 
used to encode Si with i € i? is bounded above by [log(|{a : a € -S'}] + 1)] +£+ 1 
instead of [log n] + 1. That is, we can make the bound in terms of the number 
of distinct characters in S instead of the size of the alphabet. To do this, we 
modify the algorithm again so that it stores a counter Ui of the number of 
distinct characters that have occurred in the current prefix. Whenever we would 
use n in a formula to calculate a weight, we use 2{rii + 1) instead. 



4.2 Alphabetic Dynamic Shannon Coding 

For alphabetic coding, the lexicographic order of the codewords must always 
be the same as the lexicographic order of the characters to which they are as- 
signed. Alphabetic coding is useful, for example, because we can compare en- 
coded strings without decoding them. Although there is an alphabetic version 
of minimax trees [9], it cannot be efficiently dynamized [5]. Mehlhorn [13] gen- 
eralized Shannon’s algorithm to obtain an algorithm for alphabetic coding. In 
this section, we dynamize Mehlhorn’s algorithm. 

Theorem 4 (Mehlhorn, 1977). There exists an alphabetic prefix-free code 
such that, for each character a in the alphabet, the codeword for a is of length 
[log((m-kn)/#a(5'))l -k 1. 

Proof. Let ai, . . . , a„ be the characters in the alphabet in lexicographic order. 
For 1 < i < n, let 



f{ai) = 



#aAS) + l 

2{m n) 






t=i 



m-\- n 



For 1 < J yf i' < n, notice that \f{ai) — f{ai/)\ > Therefore, the 



first 



i»8 {*-: 



m-\-n 



2{m+n 

1 bits of the binary representation of f{ai) suffice to 

□ 



a, (‘5) + l 

distinguish it. Let this sequence of bits be the codeword for m. 



Repeating Mehlhorn’s algorithm after each character of S, as described in 
the introduction, is a simple algorithm for alphabetic dynamic Shannon coding. 
Notice that we always assign a codeword to every character in the alphabet; 
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thus, we do not need to prepend _L to the current prefix of S. This algorithm 
uses at most {H + 2)m + O(nlogm) bits and 0{mn) time to encode S. 

To make this algorithm more efficient, after encoding each character of S, 
instead of computing an entire code-tree, we only compute the codeword for 
the next character in S. We use an augmented splay tree [16] to compute the 
necessary partial sums. 

Theorem 5. Alphabetic dynamic Shannon coding uses {H + 2)m -I- 0(n log m) 
bits and 0{{H -f l)m) time. 

Proof. We keep an augmented splay tree T and maintain the invariant that, 
after encoding si • • • Si_i, there is a node Va in T for each distinct character a in 
Si ... , Si-\. The node VaS key is a; it stores a’s frequency in si • • • Si_i and the 
sum of the frequencies of the characters in Uq’s subtree in T. 

To encode s,, we use T to compute the partial sum 

#Si(si • • • Si-l) , ,/ /„ „ ^ 

2 “t” / . fraj\Si • • ■ Si-ij , 

CLj <Si 



where Oj < Si means that aj is lexicographically less than s^. From this, we 



compute the codeword for Si , that is, the first 
the binary representation of 



log ( 







1 bits of 



#Si(si ■ 



2{i — I + n) 



Si-l) -|- 1 ^ ffajisi 



' • Si_l) -I- 1 



ajKSi 



i — 1 + n 



If Si is the first occurrence of that character in S (i.e., * € {1, ... , m} — i?), then 
we insert a node Us. into T. In both cases, we update the information stored at 
the ancestors of Vs^ and splay to the root. 

Essentially the same analysis as for Theorem 2 shows this algorithm uses at 
most {H + 2)m + 0{nlogm) bits. By the Static Optimality theorem [16], it uses 
0{{H + l)m) time. □ 



4.3 Dynamic Shannon Coding with Unequal Letter Costs 

It may be that one code letter costs more than another. For example, sending a 
dash by telegraph takes longer than sending a dot. Shannon [15] proved a lower 
bound of Hmln{2) /C for all algorithms, whether prefix-free or not, where the 
channel capacity C is the largest real root of -|- g-cost(i)-a; _ 

e ~ 2.71 is the base of the natural logarithm. Krause [11] generalized Shannon’s 
algorithm for the case with unequal positive letter costs. In this section, we 
dynamize Krause’s algorithm. 

Theorem 6 (Krause, 1962). Suppose cost(O) and cost(l) are constants with 
0 < cost(O) < cost(l). Then there exists a prefix-free code such that, for each 
character a in the alphabet, the codeword for a has cost less than _(- 

cost(l). 
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Proof. Let ai , . . . , be the characters in S in non-increasing order by frequency. 
For 1 < i < fc, let 



/(oi) = 

i=i 






< 1 . 



Let b{ai) be the following binary string, where xq = 0 and yo = 1: For j > 1, 
if /(oi) is in the first e-c°st(o)-C fj-action of the interval then the 

jth bit of b{ai) is 0 and Xj and yj are such that [xj,yj) is the first 
fraction of [xj-i,yj-i). Otherwise, the jth bit of b{ai) is 1 and Xj and yj are 
such that [xj,yj) is the last fraction of [xj-i,yj-i). Notice that the 

cost to encode the jth bit of b{ai) is exactly follows 

that the total cost to encode the first j bits of b{ai) is . 

For 1 < i ^ i' < k, notice that \f{ai) — f{ai')\ > ffai{S)/m. Therefore, if 
yj — Xj < ffai{S)/m, then the first j bits of b{ai) suffice to distinguish it. So 
the shortest prefix of b{ai) that suffices to distinguish b{ai) has cost less than 
in(e — ^m/#a{S)) _ \n(m/^a(S)) _j_ oost(l). Let this Sequence of bits be the 

codeword for a^. □ 



Repeating Krause’s algorithm after each character of S', as described in 
the introduction, is a simple algorithm for dynamic Shannon coding with un- 
equal letter costs. This algorithm produces an encoding of S with cost at most 
+ cost(l)) m + O(nlogm) in 0{mn) time. 

As in Subsection 4.2, we can make this simple algorithm more efficient by only 
computing the codewords we need, using an augmented dynamic binary search 
tree. However, instead of lexicographic order, we want to keep the characters 
in non-increasing order by frequency in the current prefix. Since this order may 
change as we encode, we cannot use the Static Optimality theorem again: we 
use an augmented AVL tree [2] instead of a splay tree. 

Theorem 7. Suppose cost(O) and cost(l) are constants with 0 < cost(O) < 
cost(l). Then dynamic Shannon coding produces an encoding of S with cost at 
most -I- cost(l)) m -I- O(nlogm) in 0{m log n) time. 

Proof. We keep an augmented AVL tree T and maintain the invariant that, 
after encoding si • • • Si-i, there is a node Va in T for each distinct character a 
in Tsi . . . , Si_i. The node VaS key is a’s frequency in si • • • Si_i; it stores a and 
the sum of the frequencies of the characters in Va’s subtree. 

If Si occurs in then we compute the sum of the frequencies of 

characters stored to the left of in T. From this, we compute the codeword 
for Si (i.e., the codeword that Krause’s algorithm on Tsi • • • Si-i would assign 
to Si). Finally, we delete Vs^ and re- insert it with frequency 1 greater. In total, 
this takes 0(log n) time. 

If Si is the first occurrence of that character in S (i.e., i S {I, . . . , m} — R), 
then we compute the codeword for T as described above and encode Si as the 
codeword assigned to T followed by the binary representation of sfs index in the 
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alphabet. This encoding of Si has cost 0(cost(l) • (logm + logn)) C O(logm). 
Then, we insert a node into T. In total, this takes O(logn) time. 

Essentially the same analysis as for Theorem 2 shows this algorithm produces 
an encoding of S with cost at most ^ + cost(l)) m + O(nlogm). □ 
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Abstract. We present a Lagrangian relaxation technique to solve a class of linear 
programs with negative coefficients in the objective function and the constraints. 
We apply this technique to solve (the dual of) covering linear programs with 
upper bounds on the variables: minjc^a; | Ax > b,x < u,x > 0} where 
c,u € G R+ and A G R"^"“ have non-negative entries. We obtain 

a strictly feasible, (1 + e)-approximate solution by making 0(me“^ log m + 
minjn, log log C}) calls to an oracle that finds the most-violated constraint. 
Here C is the largest entry in c or u, m is the number of variables, and n is 
the number of covering constraints. Our algorithm follows naturally from the al- 
gorithm for the fractional packing problem and improves the previous best bound 
of 0{me~^ log(mC)) given by Fleischer [1]. Also for a fixed e, if the number of 
covering constraints is polynomial, our algorithm makes a number of oracle calls 
that is strongly polynomial. 



1 Introduction 

In last couple of decades, there has been extensive research on solving linear programs 
(LPs) efficiently, albeit approximately using Lagrangian relaxation. Starting from the 
work of Shahrokhi & Matula [2] on multicommodity flows, a series of papers including 
those of Plotkin et al. [3], Luby & Nisan [4], Grigoriadis & Khachiyan [5,6], Young [7,8], 
Garg & Kdnemann [9], and Fleischer [10] proposed several algorithms for packing and 
covering LPs. Refer to Bienstock [11] for a survey of this held. All of these algorithms 
rely crucially on the non-negativity of the coefficients in the objective function and the 
constraints. 

In this paper we show how Lagrangian relaxation can be used to solve a class of LPs 
with negative coefficients in the objective function and the constraints. The basic idea 
behind this approach is as follows. Suppose we would like to solve an LP of the form 

max c^a: 

s.t. Ax < b (1) 

a: > 0 

where the entries in A and c may be negative. Suppose also that there exists an optimum 
solution a:* to (1) such that c^a:* > OandAa:* > 0. In such a case, we can add constraints 

* Supported in part by a fellowship from Infosys Technologies Ltd., Bangalore. 
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c^x > 0 and Ax > 0 to (1) without affecting the optimum solution. The resulting linear 
program can then be thought of as a fractional packing problem 

max{c^a; | Ax < b,x € P} 

where P = {a; > 0 | c^x > Q^Ax > 0}. Such a packing problem can be solved 
using algorithms similar to Plotkin et al. [3], Grigoriadis & Khachiyan [12], Garg & 
Kdnemann [9], and Jansen & Zhang [13] provided one has an oracle to minimize linear 
functions over P. 



1.1 Our Contribution and Previous Work 



We apply the technique mentioned above to (the dual of) a covering LP with upper 
bounds on the variables: 

min X 



s.t. Ax > b 
X <u 
X > 0 



( 2 ) 



where c,u G G K" and A G have non-negative entries. For a positive 

integer k, let [k] = {!,..., k}. To solve (2), we assume that we have an oracle that given 
X G K!p, finds the most-violated constraint. Formally, 



given X G RP, the oracle hnds min 

+ 76 H 






(3) 



where Aj denotes the jth row of A. The dual of (2) has negative entries in the constraint 
matrix; we reduce it to a packing problem and prove the following theorem. 

Theorem 1. There exists an algorithm that given an error parameter lo G (0,1), either 
computes a strictly feasible, eicp{uj)-approximationto (2), or proves that (2) is infeasible. 
The algorithm makes log m-|-min{n, log log C}) calls to the oracle (3) where 

^ CiUi 

mirii;c xt >0 CiUi ' 

The LP (2) is a special case of the mixed packing and covering LP: minjc^x | Ax > 
b,Px < u,x > 0} where c € G G A G and P G R!):'''”". 

The known algorithms (Young [8]) for mixed LP however compute only approximately 
feasible solutions in which either the packing or the covering constraints are violated by 
a factor of ( 1 -f e) . Our algorithm, on the other hand, computes a strictly feasible solution. 
Fleischer [1] recently presented an algorithm to compute a exp (w) -approximation to (2) 
by making 0{muj~^ log(mC')) calls' to the oracle (3). Theorem 1 thus improves her 
running time, especially when C is large or n is polynomial — for example, when 
the instance is explicitly given. We note that an approximate version of the oracle (3) 
that computes a row f G [n] such that Ayxjby < (1 -f e)minjg[„j Ajxjbj is not 

* Fleischer [1] defines C as maxig[„i] Ci/ mini:ci>o Ci. However, the running time other algo- 
rithm depends, in fact, on C as defined in Theorem 1 . Note that this value of C is invariant 
under the scaling of variables. 
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strong enough to distinguish between feasibility and infeasibility of the given instance. 
For example, if all covering constraints are approximately satisfied hy x = u, i.e., 
Au > 6/(1 + e), the algorithm may output x = m as a solution even if it may not 
satisfy Au > b implying that the instance is infeasible. Our algorithm follows naturally 
from the packing algorithm; and, in fact, can be thought of as an alternate description of 
Fleischer’s algorithm [1]. 

2 The Covering LP with Upper Bounds on the Variables 

To simplify the presentation, we modify the given instance of (2) as follows. It is no 
loss of generality to assume that all entries of c, u, and 6 are positive. This is because 
if Ci = 0, we can reduce the problem by setting Xi = Ui and replacing each bj with 
bj — AjiUi- If Ui = 0, then clearly = 0 in any feasible solution and hence we can 
remove this variable. Similarly, if bj = 0, then the jth covering constraint is trivially 
satisfied and can be removed. We can also assume that the objective function c = 1 is 
the vector of all ones; this can be achieved by working with new variables x' = CiXi 
and replacing each Aji with A^ijci and replacing each Ui with CiUi. We finally assume 
that minjg[^] Ui = 1. This is achieved by dividing each entry of u and 6 by min^gj^j Ui. 
While doing this, it is not necessary to scale the vector c by minjgj^j Ui, since that 
amounts to just scaling the optimum value. Observe that after these modifications, we 
have maxig Ui = C. Note that the oracle (3) for the modified instance can be obtained 
from the given oracle for the original instance; and a solution for the original instance 
can be obtained from a solution of the modified instance. Since 6 and c have only positive 
entries, the optimum value of (2) is positive. 

Before solving this linear program, we make a call to oracle (3) with x = it to find 
a row j g [n] that minimizes Aju/bj. If AjU < bj, then the jth covering constraint 
cannot be satisfied even if we set each variable Xi to its upper bound Ui. In this case, we 
output “infeasible”. Therefore, in what follows, we assume that Au > 6, i.e., the given 
instance is feasible. 

2.1 Reduction to the Packing Problem 

Consider the following dual LP of (2). 

max b^ y — uJ z 

s.t. y — z <1 (4) 

> 0 

Although (4) has negative entries in the constraints and the objective function, the fol- 
lowing lemma proves that it satisfies the conditions discussed in Section 1. 

Lemma 1. Any optimum solution {y* ,z*) to (4) satisfies y* —u^ z* > QandAJy* — 
z* > 0. 

Proof. Since the optimum value of (2) is positive, by LP duality, y* — u^ z* is also 
positive. Now the only constraints on z are z, > 0 and Zi > {A^y)i — 1. Once y = y* 
is fixed, in order to maximize the objective function, we would set Zi to the smallest 
possible value, i.e., max{0, {A^y)i — 1} and we have {A^y*)i — z* > 0. 
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Thus we can add constraints y — vJ z > 0 and y — z > 0 to (4) without affecting 
its optimum value. The resulting LP is 

max y — vJ z 
s.t. A^y — z <1 

b^ y — vJ z > 0 (5) 

A^y -z>0 

y,z>0 

We now define an instance of the packing problem. Define a polytope P C and 

functions /^ : P — >■ R_|_ , i € [m] as follows. 

p = {{y, ^) > 0 I A^y - z>0,b^y- = 1}, 

My,z) = {A^y)i - Zi foTi e [m]. 

Now consider the packing problem 

min max fi{y,z). (7) 

{y,z)eP ie[m] 

It is easy to see that (5) has an optimum value A if and only if (7) has an optimum 
value 1/A. Furthermore, an exp(w) -approximate solution of (7) can be scaled to obtain 
an exp (a;)- approximate solution of (5). It is therefore sufficient to obtain an exp(o;) 
approximation to (7). 

3 The Packing Algorithm 

For completeness, we outline an algorithm for the packing problem. Given a convex set 
P C K" and m convex non-negative functions /^ : P — >■ K+ , i G [m ] , the packing 
problem is to compute 

A* = minA(?/) where \{y) := max fi{y). 

V&P iG[m] 

We refer to this problem as primal. For a non-zero vector x G K™, define x := 
as the normalized vector and define d{x) := min^gp (x, /(y)) where 
{x, f{y)) = ^ifiiy) denotes the inner product. It is easy to see that d{x) < A* < 
A(y) for any x > 0 and y G P. We refer to the problem of finding x > 0 such that 
d{x) is maximum as the dual. The duality theorem (see, e.g., Rockafellar [14], p.393) 
implies that 

minA(y) = maxii(x). 

y€P ai>0 

The algorithm assumes an oracle that given x > 0, computes y G P such that 

{x, f{y)) < exp(e) • d(x) (8) 

for a constant e G (0, 1). 

Theorem 2. The algorithm in Figure 1 computes x* > Oandy* G P such that \(y*) < 
exp(3e) • d(x*). The duality then implies that x* and y* form the near-optimum solutions 
to the DUAL and primal problems respectively. The algorithm makes 0(me“^ log m) 
calls to the oracle (8). 

For the proof of Theorem 2, we refer to [15]. 
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(1) 


X := 1 


{1 is a vector of all ones} 


(2) 


Find y G P such that {x, f{y)) < exp(e) ■ d{x) 


{oracle call} 


(3) 


X* := x,d* = (x,f{y)) 


{current dual and its value} 


(4) 


r := 0 


{initialize the round number} 


(5) 


Repeat 




(6) 


r r -I- 1 


{round r begins} 


(7) 


Find yr G P such that {x, f{yr)) < exp(e) • d{x) 


{oracle call} 


(8) 


Wr •— 1/ fi{yr) 


{pick yr to an extent Wr} 


(9) 


For i G [m] do Xi := Xi ■ exp{eWrfi{yr)) 


{update the dual variables} 


(10) 


R{x,f{yr))>d* 






then X* := x, d* := (x, f{yr)) 


{keep track of the best dual} 


(11) 


Until (maxig[^] Xi > 


{round r ends} 


(12) 


Output X* and y* = (EI=i t^sVs)/ (EI=i ^s) 





Fig. 1. The packing algorithm with exponential updates of the duals 



4 Implementing the Desired Oracle 



To solve (7) using the packing algorithm, we need an oracle that given a non-zero vector 
X € , computes (y, z) G P that minimizes {A^y — z) within a factor of exp(e), 

where x = xj Xi. Since ^ Xi is fixed, we need to approximate 

R{x) := min x^{A^y — z). (9) 

iv,A&P 



Since y — vJ z = 1 for {y, z) € P, we have 



R{x) = min / ^ -r ^ \ > 0, A^y — z >0, y — vJ z > ol . 

y b' y — u' z j 

We now show how to find (y, z) that achieves an approximate value of R{x). 

For X € K™ and a > 0, define (x\a) G K™ as the “truncated” vector with {x\a)i = 
min{a;i, aui} for i G [m]. Define 



ra{x) 



min 
76 N 



Aj{x\a) 



where Aj is the jth row of A. Given x and a, we can compute (x) by making one call 
to oracle (3). The proof of the following lemma is omitted due to lack of space. 

Lemma 2. For a fixed x G K™, the function ra{x) is a non-decreasing and concave 
function of a. 



Lemma 3. For any x G K™ with Xi > Ofor i G [m], we have {x) = R{x). 
Proof. Let i/, z > 0 be such that AA y — z > 0, b^ y — vJ z > 0 and 



R{x) 



x^{A^y — z) 
b^ y — vA z 



It is no loss of generality to assume that the above minimum is achieved for (1, y) = 1; 
this can be achieved by scaling y, z. Recall that R{x) is the minimum possible value of 
this fraction. 
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Claim. Without loss of generality, we can assume that the vector z is of the form 



f {A^y)i, \fxi > R{x) ■ Ui] 

y 0, otherwise. 



( 10 ) 



Proof. Note that the coefficient of Zi in the numerator is Xi, while its coefficient in the 
denominator is Ui. If Xi/ui < R{x), we can set Zi = 0, since in this case, hy decreasing 
Zi, the value of the fraction does not increase. If on the other hand Xt/ui > R{x), the 
variable Zi must be set to its highest possible value, y) i , since otherwise by increasing 
Zi, we can decrease the value of the fraction. On increasing Zi, however, the value of 
the denominator y — vJ z may reach zero. But in this case, since Xi/ui > R{x), the 
numerator x^ {AJ y — z) reaches zero “before” the denominator. Thus we would have 
x^ {AJ y — z) = 0 and y — vJ z > 0 for some y,z >Q. Since Xi > 0 for i g [to], 
it must be that AJ y — z = 0. This, in turn, implies that (4) is unbounded and (2) is 
infeasible yielding a contradiction. This proves the claim. 

Now since 2 :) = R{x)-{b^y — vJz),'wehav&x^A^y—{x^ — R{x)-u^)z = 

R{x) ■ b^ y. 

Lemma 4. Ifz satisfies Zi = (A^y)i if Xi > a ■ Ui, and 0 otherwise for some a, then 
we have x^ AA y — (x^ — a ■ uA)z = (x\a)^ AA y. 

Proof. WeshowthatXi(^^j/)i — (xi — = (x|Q)i(^^y)i holds for each i g [to]. 
\f Xi — a ■ Ui <0, we have Zi = Q and (x|o,)i = Xi, hence the equality holds. On the 
other hand, if x^ — a • >0, we have Zi = {A^y)i and Xi{A^y)i — (xi — a - Ui)zi = 

{a ■ Ui ){A^Vi) = A\a)i (A^y)i. This proves Lemma 4. 

Thus we have {x\r{x)V A^ y = R{x) ■ b^y, i.e.. 



i?(x) 



{x\R(^)yA^y 

b^y 



Further we can assume that y is of the form 



yj = 1 for a row j g [n] such that {{x\r(^^-))^ AA)j/bj is minimum, 
yj, = 0 for / A j- 



( 11 ) 



This is true because by replacing y with the vector as dehned above, the value of the ratio 
does not increase. Therefore indeed we have x/j( 3 ;)(x) = min^ Aj{x\fi(^x)) /bj = R{x) 
and the proof of Lemma 3 is complete. 

Lemma 5. For any x g K™ with Xi > Ofor i g [to], we have r^x) < cr if and only if 
a > R{x). 

Proof. Since is a concave function of a and xo(x) = Q,rR(^x) = R{x), we have 
faA) > ct for 0 < a < i?(x). Therefore r„(x) < a implies a > R{x). 

Now let a > R{x). Again using the concavity of r^, we have r^ix) < a. We now 
show that strict inequality holds. Let (y,z) be vectors as given in (11) and (10) that 
achieve the minimum ratio R{x). Note that we have {A^y)i — Zi > 0 for some i g [to]; 
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for a fixed a: > 0 



0 



R{x) 



0 



R{x) 



(a) 



(b) 



Fig. 2. The possible forms of the function for a fixed a; > 0. (a) case if min^gj^j Ajujbj > 1, 
(b) case if min^gj^] Ajujbj = 1. 

otherwise the argument in the proof of Lemma 3 implies that (2) is infeasible. For such 
i € [m], we have {A^y)i > 0 and Zi = 0. This implies Xi < R{x) ■ Ui and Aij > 0 
for index j € [n] defined in (11). Note that Xi < R{x) ■ Ui < a ■ Ui implies {x\a)i = 
(2^lfl(a:))i = Si > 0. TWs Combined with Aij > 0 and {x\a) < a/R{x) ■ 
implies that Aj{x\a)/bj < a/R{x) ■ Aj{x\n(^x)) /bj = a/R{x) ■ R{x) = a. Since this 
holds for all j € [n] that attain the minimum in (1 1), we have (s) < a. Thus Lemma 5 
is proved. 

Now from Lemmas 2,3, and 5, the function must be of the form given in Figure 2. 
Fixans > 0. Consider ana € [0,minig[^] Si/Mi].Notethat(s|a) = a -it. Therefore we 
haver„(s) = a- min^gj^j Since (2) is feasible, we have minjg[„] > 1. 

Nowifmiiijgjn] Ajujbj > 1, wehavera(s) > a. In this case, from Lemmas 2,3, and 5, 
we get that a = 0 and a = R{x) are the only values of a satisfying Ta{x) = a and the 
functionrahastheformgiveninFigure2(a).Ontheotherhand,ifminjg[„] Aju/bj = 1, 
we have Ta{x) = a. In this case, the function has the form given in Figure 2 (b). In 
either case, R{x) is the largest value of a such that r„(s) = a. 

4.1 Computing i?(l) 

In the packing algorithm (Figure 1), we initialize x = 1. In this section, we describe 
how to compute i?(l) and prove the following lemma. 

Lemma 6. The exact value o/i?(l) can be computed by making 0{n + log m) calls to 
the oracle (3). Furthermore, given e € (0, 1), an exp(e) approximation to R(l) can be 
computed by making 0(log log C + log(l/ e)) calls to the oracle (3). 

Lemma 7. i?(l) > 1/maxiUi = 1/C. Further if R{1) > 1/miniUj = 1, then 
R{1) = min^g[„] Ajl/bj. 

Proof. We first argue that R{u) > 1. Recall that 



R{u) = min 



vJ AA y — z 



Tt T — 

(y,z) b'y-u'z 
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where the minimum is taken over (y, z) > 0 such that vJ y — vJ z > 0 and y — 
lA z > 0. Since (2) is feasible, we have minjg[„] Ajujbj > 1. Hence to minimize the 
above ratio, one has to set z to zero and this gives R{u) = minjg[„] Aju/bj > 1. Since 
C • 1 > It, we have i?(C • 1) > R{u) > 1. Therefore i?(l) = R{C ■ 1)/C > 1/C. 

Now assume i?(l) > 1. Since w > 1, we have min{l,i?(l) • Ui] = 1 for each 
i G [to]. Therefore (1 1 = 1. From Lemma 3, we have i?(l) = Therefore, 

i?(l) = minjg[„] Ajl/bj as desired. 

Computing il(l) approximately. Now we describe a simple procedure to com- 
pute an exp(e) approximation to i?(l) where e G (0,1). We first compute ri(l). 
If ri(l) > 1, Lemma 5 implies that i?(l) > 1. In this case, Lemma 7 gives that 
ii(l) = min^g[„] Ajl/bj and we can compute this value by making one more oracle 
call. 

On the other hand, if r-i(l) < 1, then we have 1/C < i?(l) < 1. In such a case, we 
have lower and upper bounds on if(l) within a factor C. We do a bisection search as 
follows. We set a = \JAJC and compute ra(l). If ra(l) > a. Lemma 5 implies that 
i?(l) G [a, 1]. On the other hand, if ra(l) < a, we get that i?(l) G [1/C, a]. In either 
case, we reduce the ratio of upper and lower bounds from C to v/C- By repeating this 
O (log log C + log (1/e)) times, we can reduce the ratio of upper and lower bounds to 
exp(e). Thus we can compute a such that exp(— e) • a < i?(l) < ct. 



Computing i?(l) exactly. We use the fact that ii(l) is the maximum value of a such 
that rQ,(l) = a (see Figure 2) to compute the exact value of R{1). Reorder the indices 
i G [to] such that 1 = ui < U 2 < • • • < Wm = C. We first compute ?"i(l). If 
’’i(l) > 1 /mi = 1, then Lemmas 5 and 7 imply that i?(l) = minjg[„] Ajl/bj and we 
can compute it exactly. We therefore assume that 1/wm < .R(l) < 1/ui.Giveni G [to], 
we can determine if i?(l) < 1/ui. This is done by computing ra(l) with a = 1/ui 
and comparing its value with a (see Lemma 5). Therefore, by making 0(log to) oracle 
calls, we can compute (unique) k G [to — 1] such that \/uk+i < R{1) < 1/ufe. Now 
note that for a g\ 1/ Uk+i , 1 /uk ] , we have 



(l|a)i = min{l,o;Ui} 



aui, for i = 1, . . . , fc; 

1, for i = k + 1, . . . ,m. 



Therefore for any row j G [n], we have that Aj{l\a)/bj is a linear function of a 
in the range a G [1/ttfc+i, 1/wfc]. In Figure 3, we have shown the linear functions 
corresponding to some rows jo and ji. Recall that the function ro,(l) is the “lower 
envelope” of these functions and our objective is to compute i?(l) which is the largest 
value of a such that ra(l) = a. Thus ii(l) is the point at which the lower envelope 
intersects the line corresponding to the linear function a which is shown dotted in 
Figure 3. 

We first make an oracle call to compute /o = argminjg[„] Aj{l\a) /bj where a = 
1/wfc. (Refer to Figure 3.) The solid line corresponds to the linear function (l\a)/bj^. 
We assume that the oracle also returns the most-violated constraint. Since we know the 
row Ajg, we can compute aj„ which is the value of a for which Aj^ (l\a)/bjQ = a. As 
shown in the figure, Uj^ is the value of a at which the dotted and the solid lines intersect. 
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Fig. 3. Eliminating rows one-by-one to compute R{1) exactly 

We then make another oracle call to compute ji = arg min^ g ^ ^ ( 1 1 / bj . The 

light line in Figure 3 corresponds to the linear function Aj^{l\a) /hj^. If ji = jo> then 
since the solid line goes below the dotted line for a > ctyo, we know that aj^ is the 
largest value of a such that r„(l) = a and i?(l) = ctQ. On the other hand, if ji ^ jg, 
we find which is the value of a at which the dotted and the light lines intersect. We 
continue this to compute rows j 2 ,js, ■ ■ ■ till we find = ji for some I > 0. Since 
i?(l) itself is a value of a that corresponds to the intersection of the dotted line and 
the line corresponding to the function Aji{l\a) /bj' for some row f G [n], after 0{n) 
such iterations, we find ji+i = ji = f for some I > 0. Thus we can compute i?(l) by 
making 0{n + log m) calls to the oracle (3). 

Computing (y, z) E P that approximates i?(l) well. Given an approximate or an 
exact value of i?(l), we still need to find (y,z) € P that achieves an approximation to 
R{1). 

Lemma 8. Given x € such that Xi > Ofor i G [m] and a such that exp(— e) • a < 
R{x) < a, we can compute {y, z) G P such that x^ {A^y — z) < exp(e) • R{x) by 
making one call to the oracle (3). 

Proof. We use an argument similar to the one given in the proof of Lemma 3. We 
compute j G [n] such that Aj{x\a)/bj is minimum and set yj = 1 and yj/ = 0 for 
j' ^ j. Therefore we have {x\a)^ AJ y = Ta{x) ■ b^ y. We also set 

^ ^ r (3i.^y)i, ifxi> a - Ui\ 

* 0, otherwise. 

Now Lemma 4 implies that x^ A^y— — a ■ u^)z = {x\a)^ A^ y . Hence x^ AJ y — 
(a;^ — a ■ vJ)z = Ta{x) ■ b^y. This, in turn, implies x^{A^y — z) = ra{x) ■ b^ y — a ■ 
uA z < a{b^ y — z). The strict inequality follows from ra{x) < a (Lemma 5) and 

b^ y > 0. Therefore we have b^ y — z > 0 and 

X^ [AA y — Z) / X -n/ N 

- 7 ^;= 7 — < a < exp(e) • R{x). 

b' y — u' z 

Thus the proof is complete. 

Using the above lemma, we can compute {y, z) G P that achieves an exp(e) approxi- 
mation to i?(l). 
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5 Algorithm for Solving (7) 

In the packing algorithm (Figure 1), we initialize x = 1. Wesete = w/4wherew S (0, 1) 
is the given error parameter. As explained in the previous section, we compute an a such 
that exp(— e) • a < R{x) < a. Using Lemma 8, we compute {y, z) G P that forms a 
exp(e) approximation to the oracle call and update the “dual variables” x (step (9) in 
Figure 1). During the algorithm, the values of Xi for i G [m] never decrease. Therefore, 
the value of 

R{x) = min x^ {A^ y - z) 

also never decreases. We repeat this till we find that R{x) > a; this can detected 
by checking if ra{x) > a (Lemma 5). In such a case, we update the value of a as 
a := exp(e) • a and repeat.^ 

Let ao be the initial value of a. Note that we have i?(l) G [ap exp(— e), ctp). We 
say that the algorithm is in phase p > 0 if the current value of a is ap exp(pe). 

Bounding the number of oracle calls. We designate an oracle call in which we do not 
find the desired (y, z) G P as “unfruitful”. After each unfruitful oracle call, we update 
the value of a and go into the next phase. Hence the number of unfruitful calls to (3) is 
at most the number of phases in the algorithm. The following lemma proves a bound on 
the number of phases in the algorithm. 

Lemma 9. The algorithm has 0{e~^ log m) phases. 

Proof. The algorithm initializes x = 1 and terminates when for some i G [m] , we have 
Xi > (step (11) in Figure l).Thus the maximum value a can take is o;p-exp(e)m^/'^. 

Since the value of a increases by exp(e) in each phase, the number of phases in the 
algorithm is 0{e~^ logm). 

Thus the number of unfruitful oracle calls is 0{e~^ log m). Call an oracle call in which 
we hnd {y, z) G P satisfying the desired inequality as “fruitful”. Clearly the number 
of fruitful oracle calls is equal to the number of oracle calls in the packing algorithm 
and from Theorem 2, this is 0(me“^ log m). In the pre-processing, to compute an 
approximate value of i?(l), we make 0(min{n + log m, log log C + log(l/e)}) oracle 
calls. Thus the total number of calls to the oracle (3) in the algorithm is 0{me~^ log m + 
min{n + logm, loglogC + log(l/e)}) = logm + min{n, log log C}). 

Proving feasibility and near-optimality of tbe solutions. Now from Theorem 2, the 
algorithm outputs x > 0 and (y,z) G P such that 

^{y,z) < exp(3e) • d{x) (12) 

where 

A(y, 2) = max - Zi) and d(x) = min (x,f{y,z)) (13) 

ie[m] V / (y,z)£P 

^ This technique of updating a in multiples of exp(e) was first used hy Fleischer [10] to reduce 
the running time of the maximum multicommodity flow algorithm. 



Fractional Covering with Upper Bounds on the Variables 



381 



where i = xj Recall that the algorithm (Figure 1) outputs the best dual solution 

found in any round and hence 



X = x'^ where r = are max c?(x’') 

l<r<tV 



where denotes the value of x at the end of round r. Let p he the phase in which the 
algorithm computes the best dual solution x. Let x be the value of the dual variables x 
at the end of the phase p — 1. Let ap-i = ao exp((p — l)e) be the value of a in phase 

p-1. 



Lemma 10. The vectors x* = (x|ap_i)/ap-i and {y*,z*) = {y,z)/\{y,z) form 
feasible solutions to the primal linear program (2) and the dual linear program (4) 
respectively. Furthermore, 

l^x* < exp(w) • {yy* - u^z*^ . 

Thus X* and {y* , z*) form near-optimum solutions to (2) and (4) respectively. 



Proof. Since (x|q J < ap-\u, we have x* < u. Since the oracle call with x = 
X was unfruitful at the end of phase p — 1, we have ra^_^{x) > ctp-i. Therefore 
minjg[„] > «p_i and hence min^-gj^] > 1. Thus 

X* forms a feasible solution to (2). From the dehnition of \{y, z), it is clear that A^y* — 
z* < 1 so that (y*,z*) forms a feasible solution to (4). 

Now let ap = ao exp(pe) be the value of a in phase p. We have 



l'x* = 



Qfp— 1 



< 



E m ~ 



■ 

G.p—1 



(since x does not decrease, we have x > (x|ap_J) 



= exp(e) 



E m 



(since ap = exp(e) • ap_i) 



From the dehnitions (9) and (13) of i?(x) and d(x) respectively, we have i?(x) = 
d{x) ■ Since the oracle call with x = x was fruitful in phase p, we have 

Oip > R{x) = d{x) ■ Therefore 

1 



1 X* < exp(e) 
< exp(4e) 



d{x) 

1 



Hy,z) 



(from (12)) 



= exp(o;) 



y — z 



exp(a;) • y* — z*^ . 



(since (p, z) G P and oj = 4e) 



Therefore the proof is complete. 



Note that we can modify the algorithm to keep track of the values of the dual variables 
at the end of the previous round and thus be able to compute x, ap_i , and hence x*. 
Thus the proof of Theorem 1 is complete. 
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6 Conclusion 

We presented a technique to handle negative entries in the objective function and the 
constraints via Lagrangian relaxation. We applied this technique to the fractional cover- 
ing problem with upper bounds on the variables. We reduced the dual of this problem to 
the standard packing framework and derived an algorithm for this problem. However, the 
technique seems to be very limited in its applicability since it needs an optimum solution 
with non-negative values for the objective function and the constraints. The negative en- 
tries can also be handled by working with perturbed functions as done in [15]. However, 
there is still no fairly general and satisfactory way of taking care of negative entries. 
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Abstract. Negotiation-range mechanisms offer a novel approach to 
achieving efficient markets based on finding the maximum weighted 
matching in a weighted bipartite graph connecting buyers and sellers. 
Unlike typical markets, negotiation-range mechanisms establish nego- 
tiation terms between paired bidders rather than set a final price for 
each transaction. This subtle difference allows single-unit heterogenous 
negotiation-range markets to achieve desirable properties that cannot 
coexist in typical markets. This paper extends the useful properties 
of negotiation-range mechanisms to include coalition-resistance, making 
them the first markets known to offer protection from coalitions. Ad- 
ditionally, the notion of negotiation-range mechanisms is extended to 
include a restricted setting of combinatorial markets.^ 



1 Introduction 

Recently much research has aimed at designing mechanisms intended to facilitate 
trade between market players with conflicting goals. Such mechanisms are often 
targeted for use on the Internet in exchanges intended to achieve efficiency by 
increasing sales volume and disintermediating agents. Such mechanisms must be 
carefully designed to motivate truthful behavior on the part of individual partici- 
pants while maintaining underlying commercial viability. Though motivating in- 
dividuals to behave in accordance with the market’s interests is itself important; 
the larger goal, given the emergence of massively distributed exchanges, should 
also include protecting the exchange from coordinated attacks from groups of 
aligned individuals. This paper introduces a class of mechanisms that can be 
used to efficiently exchange noncommodity goods without necessarily disinter- 
mediating agents, and is the first market to look beyond the perspective of a 
single player by offering protection against some forms of group manipulation. 

The aforementioned combinatorial market extends the concept of 
negotiation-based mechanisms in the context of the theory of truthful efficient 
markets introduced by [1] . A market consists of multiple buyers and sellers who 
wish to exchange goods. The market’s main objective is to produce an allocation 
of sellers goods to buyers as to maximize the total gain from trade. 

^ This is an extended abstract, a full version is available on the author’s web site. 
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A commonly studied model of participant behavior is taken from the field of 
economic mechanism design [4,12]. In this model each player has a private valu- 
ation function that assigns real values to each possible allocation. The algorithm 
motivates players to participate truthfully by handing payments to them. 

In recent years much research has focused on designing mechanisms that allow 
exchange between multiple buyers and a single seller. Combinatorial auctions are 
a special group of such mechanisms and allow buyers to submit combinatorial re- 
quests for goods to a single seller. Almost all combinatorial auction mechanisms 
designed to date only apply to restricted “single minded” participants, partici- 
pants who only desire a single bundle. The only polynomial non single-minded 
combinatorial auction mechanism known is [2]. 

A more general and desirable mechanism is an exchange where participants 
can submit both buyer bids, i.e. requests to buy items for no more than a given 
price, and seller bids (also called “asks”), i.e. requests to sell items for at least 
a given price. The mechanism in an exchange collects buyer bids and seller bids 
and clears the exchange by computing :(i) a set of trades, and (ii) the payments 
made and received by players. In designing a mechanism to compute trades and 
payments one must consider the bidding strategies of self-interested players, i.e. 
rational players that follow expected-utility maximizing strategies. Allocative 
efficiency is set as the primary goal; to compute a set of trades that maximizes 
value. In addition individual rationality (IR) is required so that all players have 
positive utility to participate, budget balance (BB) so that the exchange does 
not run at a loss, and incentive compatibility (IC) so that reporting the truth is 
a dominant strategy for each player. 

Unfortunately, Myerson and Satterthwaite’s (1983) well known result demon- 
strates that in bilateral trade it is impossible to simultaneously achieve perfect 
efficiency, BB, and IR using an IC mechanism [II]. A unique approach to over- 
come Myerson and Satterthwaite’s impossibility result was attempted by [14]. 
This result designs both a regular and a combinatorial bilateral trade mechanism 
(which imposes BB and IR) that approximates truth revelation and allocation 
efficiency. 

Negotiation-range mechanisms circumvent Myerson and Satterthwaite’s im- 
possibility result by relaxing the requirement to set a final price. A negotiation- 
range mechanism does not produce payment prices for the market participants. 
Rather, it assigns each buyer-seller pair a price range, called a Zone Of Possible 
Agreements (ZOPA). The buyer is provided with the high end of the range and 
the seller with the low end of the range. This allows the parties to engage in 
negotiation over the final price while promising that the deal is beneficial for 
each of them. 

This paper focuses on a restricted combinatorial extension to the single-unit 
heterogeneous setting of [1]. In that setting n sellers offer one unique good each 
by placing a sealed bid specifying their willingness to sell, and m buyers, inter- 
ested in buying a single good each, place up to two sealed bids for goods they are 
interested in buying. The restriced combinatorial extension is an incentive com- 
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patible bilateral trade negotiation-range mechanism (ZOPAC) that is efficient, 
individually rational and budget balanced. 

Like [1], this result does not contradict Myerson and Satterthwaite’s impor- 
tant theorem. We present a two stage process. The first stage, ZOPAC mecha- 
nism, establishes buyer-seller matchings and negotiation terms, and the second 
stage allows bidders to negotiate on final prices. Although the first stage equips 
buyers and sellers with beneficial deals and the best negotiation terms, the sec- 
ond stage cannot promise that every deal will always finalize. For example if both 
parties in the negotiation process are aggressive negotiators they may never final- 
ize the deal. However ZOPAC’s negotiation range guarantees that each pairing 
is better off making a deal than not. 

In a negotiation-range mechanism the range of prices each matched pair 
is given is resolved by a negotiation stage where a final price is determined. 
This negotiation stage is crucial for these mechanisms to be IC. Intuitively, a 
negotiation-range mechanism is incentive compatible if truth telling promises 
the best ZOPA from the point of view of the player in question. That is, he 
would tell the truth if this strategy maximizes the upper and lower bounds on 
his utility as expressed by the ZOPA’s boundaries. Yet, when carefully exam- 
ined it turns out that it is impossible (by [11]) that this goal will always hold. 
This is simply because such a mechanism could be easily modified to determine 
final prices for the players, e.g. by taking the average of the range’s boundaries. 
Here, the negotiation stage comes into play. It is shown that if the above utility 
maximizing condition does not hold then the player cannot influence the nego- 
tiation bound that is assigned to his matched bidder no matter what value he 
declares. This means that the only thing that he may achieve by reporting a 
false valuation is modifying his own negotiation bound, something that he could 
alternatively achieve by reporting his true valuation and incorporating the effect 
of the modified negotiation bound into his negotiation strategy. This eliminates 
the benefit of reporting false valuations and allows the mechanism to compute 
the optimal gain from trade according to the players true values. 

The problem of computing the optimal allocation that maximizes gain from 
trade for the heterogeneous single-unit setting can be conceptualized as the prob- 
lem of finding the maximum weighted matching in a weighted bipartite graph 
connecting buyers and sellers, where each edge in the graph is assigned a weight 
equal to the difference between the respective buyer bid and seller bid. It is 
well known that this problem can be solved efficiently in polynomial time. How- 
ever the optimal allocation which maximizes gain from trade in a combinatorial 
setting is NP-hard. 

The combinatorial setting allows sellers of single-unit goods to join together 
to sell to combinatorial buyers, i.e. buyers who are interested in bundles. Buyers 
are not single-minded but are double-minded, i.e., can bid for two bundles. The 
sellers participate in a negotiation stage that resolves the ownership of the bundle 
to determine which seller will sell the bundle to the buyer. This setting limits 
the allocation problem’s complexity by allowing only disjoint bundles. 
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A well known technique for designing IC mechanisms is the VCG payment 
scheme [3,8,15]. VCG IC payment schemes support efficient and IR bilateral 
trade but not simultaneously BB. The particular approach discussed in this 
paper adapts the VCG payment scheme to achieve budget balance. 

In a combinatorial VCG market setting the problem of maintaining BB is 
compounded by the problem of maintaining IC when efficiency is approximated 
[10,12]. Even more so, according to [9] most “reasonable” IC combinatorial mar- 
kets settings can only apply VCG and therefore are not BB. ZOPAC’s restricted 
setting allows efficient allocation in polynomial time, IC is maintained with the 
VCG approach, and BB is maintained through the modifications to VCG.. 

The main result is the discovery that the price ranges output by the mech- 
anisms protect them from some group manipulations by participants. It was 
recently pointed out in [6] that the output prices of a mechanism and group 
manipulations (by coalitions) are tightly bounded. By showing that mechanisms 
with discriminating prices are not coalition proof and by showing that IC mecha- 
nisms that maximize efficiency require discriminating prices, i.e., IC mechanisms 
that maximize efficiency are not coalition proof. 

Generally, coalition-proof mechanisms are difficult to achieve. A specific case 
of coalition resistance called group strategyproof was looked at by [5] . In a group 
strategyproof environment no coalition of bidders can collude by bidding un- 
truthfully so that some members of the coalition strictly benefit without causing 
another member of the coalition to be strictly worse off. Such mechanisms do not 
resist side payments as they do not limit the extent to which a player gains from 
the loss of another. Negotiation-range mechanisms are coalition proof against 
tampering with the upper bound of the negotiation range, meaning that no 
coalition of buyers can improve the negotiation-starting condition of another 
buyer or benefit from making side payments to compensate for altering this 
condition. 

The coalition resistance property is made possible by the negotiation range 
mechanism’s price ranges. To the author’s knowledge the algorithms herein are 
the first coalition-resistant mechanisms known for a market setting. 

The rest of this paper is organized as follows. Section 2 describes the 
negotiation-range mechanism model and definitions. Section 3 presents the re- 
stricted combinatorial mechanism and shows that it is an IC mechanism that 
combines efficiency, IR and BB. Section 4 concludes with an exploration of 
the coalition-resistance property. The property presented applies both to the 
restricted combinatorial market mechanism and the single-unit heterogeneous 
market presented in [I]. 

2 Negotiation Markets Preliminaries 

First a heterogenous market setting where buyers can buy only a single good is 
defined. Later this setting will be extended to a restricted combinatorial market 
setting. Let U denote the set of players, N the set of selling players, and M the set 
of buying players, where II = iV U M. Let 'I = {1, ..., fc} denote the set of goods. 
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Ti{Ai) denote the A’th coordinate of the vector Ti. Let Ti € {—1,0, 1}^ denote 
an exchange vector for a trade, such that player i buys goods {A € (A) = 1} 

and sells goods {A g if'lTi (A) = —1}. Let T = (Ti, ...,T|/ 7 |) denote the complete 
trade between all players. Vj(Tj) defines buyer j’s reported value, and Vj(Tj) 
defines buyer j’s true value, for any possible exchange vector Tj. Ci{Ti) defines 
seller i’s reported value, and Ci{Ti) defines seller z’s true value, for any possible 
exchange vector Ti. For simplicity the reported value will be assumed to be a 
truth-telling value and denoted Vj{Tj), Ci{Ti), unless it is an assumed lie and 
would be denoted Vj{Tj), Ci(Ti) . All bids (buyers’ and sellers’) give a positive 
value for buying/selling a good, Ci{Ti) > 0, and Vj{Tj) > 0. Ci{T^) = oo z i, 
value oo for selling anything other than her good At. It is assumed that buyers 
and sellers have no externalities, meaning that buyers and sellers are indifferent 
to other participants’ allocations; free disposal, meaning that a buyer does not 
differentiate between allocations that assign him more than he requested; and 
normalization Uj(0) = 0. For simplicity sometimes Vj(Tj), Tj(Ai) = 1 will be 
denoted as Vj{i). Let T* denote the value-maximizing gain from trade, given 
reported values, Vj{Tj), and Ci{Ti), from each buyer and each seller with total 

value V* = J2 Ci {T* ) ■ Let T* denote the value-maximizing gain from 

j i 

trade given reported lie-telling values, and V* the total value of T* . Denote by 
Xi the function that gives —1 if there exists a coordinate A s.t. Ti{A) = — 1, 
and otherwise gives 0. Denote by 5i the function that gives 1 if there exists a 

coordinate A s.t. Ti{A) = 1, and otherwise gives 0. Player i obtains at least 

utility Uf{ci) = {xi + 1) ' Ci(T'i) + Li for selling and player j obtains at least 

utility Uj {vj) = Sj ■ Vj{Tj) — Hj for buying where L G and H € are 

vectors of payments and Li = 0, Hj = 0 for Ti{Ai) = 0, Tj{Ai) = 0 respectively. 
Sellers are allowed to place a valuation on the good they are selling and on other 
goods they wish to buy, but are not allowed to bid to buy their own good. 

2.1 Negotiation-Range Mechanisms 

In designing negotiation-range mechanisms, the goal is to provide the player’s 
with a range of prices within which they can negotiate the final terms of the deal 
by themselves. The mechanism should provide the buyer with the upper bound 
on the price range and the seller with the lower bound on the price range. This 
gives each player a promise that it will be beneficial for them to close the deal, 
but does not provide information about the other player’s negotiation terms. 

Definition 1. [1] Negotiation Range: Zone Of Possible 

Agreements, ZOPA, between a matched buyer and seller. The ZOPA is a range, 
{L,H), 0 < L < H, where H is an upper bound (ceiling) price for the buyer and 
L is a lower bound (floor) price for the seller. 



Definition 2. [1] Negotiation-Range Mechanism: A mechanism that computes 
a ZOPA, (L, H), for each matched buyer and seller in T* , and provides the buyer 
with the upper bound H and the seller with the lower bound L. 
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One of the negotiation-range mechanism properties is coalition proof for buy- 
ers’ ceilings. 

Definition 3. Ceiling Coalition Proof: A mechanism is ceiling coalition proof if 
the following holds for all S, v and v (where S = {j\vj ^ Vj} is the defecting 
group): Uf{vj) > Uf{vj), Vj € S. That is, no member of the group benefits from 
the group’s colluding and lying to the mechanism. 



3 The Restricted Combinatorial Setting (ZOPAC) 

ZOPAC employs two distinct negotiation processes that define the behavior of 
individual sub-markets, a primary market and a secondary market. The primary 
market begins by collecting bids and determining which sellers will be allowed 
to sell together. The primary market then hands control over to the secondary 
market which uses the negotiation principle to resolve the ownership of bundles 
such that one seller owns each bundle. Finally, control is returned to the primary 
market where buyers and sellers negotiate as in ZOPAS [1] to complete their 
transactions. 

Assume that there are n sellers, each bringing one unit of an indivisible, 
heterogeneous good to the market. Also assume there are m buyers, each inter- 
ested in buying at most one bundle of heterogeneous goods. Each seller privately 
submits a bid for the good she owns and optionally declares the clique she is 
interested in selling with. Seller cliques are disjoint groups of sellers willing to 
cooperate to sell a bundle together. 

Definition 4. Bundle: A bundle is either a single good owned by a single seller 
or a collection of two or more goods owned by a clique. 

Definition 5. Seller Clique: A clique Q, Q G N is a group of two or more 
sellers selling their goods as a bundle q. Qi fl Qj = 0 for i ^ j . 

Definition 6. Dominant Seller: A seller who ultimately represents a clique 
when negotiating with a bundle’s buyer. 

For each clique, ZOPAC creates a dummy seller with a bid equal to the 
sum of the sellers’ bids for the group’s component goods. Clique Qfs dummy 
bid would be cf{Tf) = ^ Ci{Ti). Once the sellers have stated their clique 

i&Qt 

membership each buyer bids on two bundles knowing that he will be assigned 
at most one bundle. 

The mechanism now computes the ZOPAs for the primary market and es- 
tablishes the negotiation terms between the buyers, sellers, and seller cliques. 

The last stage is divided into three major steps. The first step computes T* 
by solving the maximum weighted bipartite matching problem for the bipartite 
graph Kq^rn constructed by placing the m buyers on one side of the graph, the 
q bundles’ sellers on another and giving the edge between buyer j and bundle’s 
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seller or clique i weight equal to Vj{i)—cf{T* ). The maximum weighted matching 
problem in solvable in polynomial time (e.g. using the Hungarian method). The 
result of this step is a pairing between buyers and bundles’ sellers that maximizes 
gain from trade. 

In the second step the algorithm computes a lower bound for each pairing’s 
ZOPA (a dummy seller floor) and assigns it to the dummy seller. Each dummy 
seller’s floor is computed by calculating the difference between the total gain 
from trade when the buyer is excluded and the total gain from trade of the 
other participants when the buyer is included (the VCG Principle). 

Let (V-j)* denote the gain from trade on the value maximizing trade, when 
player j buyer’s bids are discarded but player j’s seller’s or clique’s bids are not 
discarded. 

Definition 7. Seller Floor: [1] The lowest price the seller should expect to re- 
ceive, communicated to the seller by the mechanism. The seller floor for player 
i who was matched with buyer j on bundle Ai, i.e., Tj^Af) = 1 and Ai € F , is 
computed as follows: Li = Vj{T*) -\- (V-j)* — V* . The seller is instructed not to 
accept less than this price from her matched buyer. 

In the third step the algorithm computes the upper bound for each pairing’s 
ZOPA (a buyer ceiling) and assigns it to the buyer. Each buyer’s ceiling is 
computed by removing the buyer’s matched seller and calculating the difference 
between the total gain from trade when the buyer is excluded and the total gain 
from trade of the other participants when the buyer is included. 

Let fr~^)* denote the value-maximizing gain from trade, where seller i is 
removed from the trade. Denote (P“*)* the total gain from trade, such that 
(]/-»)* = Vj ((T“*)j) —Y^Ci ((T“*)*). Let (VZj)* denote the gain from trade 

on the value maximizing trade, (T“*)*, when player j’s buyer’s bids are discarded 
but player j’s seller’s or clique’s bids are not discarded. 

Definition 8. Buyer Ceiling: [1] The highest price the buyer should expect to 
pay, communicated to the buyer by the mechanism. The buyer ceiling for player 
j on good Ai G F i j is computed as follows: Hj = Vj(T*) (VCj)* — (P“*)* 
where Tj^Af) = 1. The buyer is instructed not to pay more than this price to his 
matched seller. 

The third stage begins once the mechanism computes the ceiling and the 
floor for every bundle. The mechanism reports the floor to the seller or the 
clique and the ceiling to the buyer. ZOPAC now settles the deals between sellers 
in their cliques to determine which is the dominant seller, this process occurs in 
the secondary market. Negotiations in the secondary market are conducted by 
constructing a series of markets as follows: 

1. Each seller submits one bid vf for the entire bundle (with the exception of 
her own good). The seller with the highest bid becomes the current dominant 
seller. 
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2. The current dominant seller splits her bid into bids for each of the other 
goods in the bundle. 

3. ZOPAC allocates goods to the dominant seller in a way that maximizes the 
gain from trade. ZOPAC also limits the dominant seller to a total final price, 
denoted as c-", that does not exceed vf’ — Y^Cj{T*), where j is a seller whose 

j 

good was not allocated to the dominant seller. 

4. After the current dominant seller closes her deals, ZOPAC finds a new current 
dominant seller among the sellers still holding goods, constructs a new market 
representing the present state of consolidation, and the process is repeated. 

5. The secondary market concludes when each clique has one seller that owns 
the group’s goods. 

6. The dominant sellers then negotiate with their bundle’s original buyer to 
reach a final price for the bundle. 



ZOPAC 

Primary Market 

For each seller i 

Record i’s sealed bid Ci{T*) for her good Ai. 

Record the seller clique Qt i is willing to sell with. 

For every Qt construct the dummy seller cf {T* ) ~ ^ ct (T* ) 

ieQt 

Each buyer submits bids for two bundles. 

Compute the optimal solution T*. 

For every buyer j, allocated bundle t, compute L and H according to ZOPAS 
Li = Vj{t) + (V-j)* - V* 

Hj = v,{t) + (P-*)* - (p-‘)* 

For every clique Qt 

Reveal cf{Tt) and Lt and find the dominant seller. 

For every buyer j 

Report his ceiling Hj and identify his matched dominant seller i. 

Conduct buyer-seller negotiation according to ZOPAS. 

Secondary Market 

Construct an exchange: 

1. Every seller i £ Qt bid nf G [Lt - Ci{T*), ci{Tt) - Ci{T*)\ 

2. Choose i such that maxuf* 

3. i bids Vi{j) on j G Qt, j ^ i such that ^ Vi{j) = v[’ 

4. Compute ^Cf{Tf) such that Vi{f) — Cf[Tj) < 0 

/ 

5. Conduct negotiation between i and j such that 

Vi[j)-Cj{T;)>Q 
andcf <uf -^ c/(T;) 

/ 

6. Update the bound of i’s bid vf 

Repeat 1-6 until 3i such that i owns all goods of sellers in Qt- 
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3.1 Analysis of the Algorithm 

In this section we analyze the properties of the ZOPAC mechanism. All proofs 
are given in the full paper [7] . 

Theorem 1. The ZOPAC market negotiation-range meehanism is an 
incentive- compatible bilateral trade mechanism that is efficient, indi- 
vidually rational and budget balanced. 

One can see that ZOPAC ’s primary market is an efficient polynomial time 
mechanism, efficient in the sense that players are always better off closing deals 
than not. The process of finding the dominant seller in a secondary market is 
polynomial since in every market in the secondary market at least one seller 
settles and is not included in the next round of consolidations. The remainder 
of this section shows that ZOPAC satisfies the theorem above. 

Claim. ZOPAC is individually rational, i.e., for every dominant seller i and every 
buyer j in the primary market 

U^ict) = ixt + 1) • c^{Tt) + Lt> maxl'cf -h a{T*),4{T*)} and C//(uj) = Sj ■ 
Vj{Tj) — Hj >0 and for every seller in the secondary market Uf{ci) = (xi + 1) ■ 
Ci{Ti) -\- Li > Ci(T*). 

ZOPAC ’s budget-balanced property in the primary market follows from 
lemma 1 in [1] which ensures the validity of the negotiation range, i.e., that every 
dummy seller’s floor is below her matched buyer’s ceiling. In the secondary mar- 
ket the final price is reached between the seller’s cost and the dominant seller’s 
willingness to pay, and the mechanism only chooses valid pairings. 



Weakly Dominant Strategies in Negotiation- Range Mechanisms. 

Negotiation-range mechanisms assign bounds on prices rather than final prices 
and therefore a mechanism only knows each player’s minimum and maximum 
utilities for his optimal deal. A basis for defining a weakly dominant strategy for 
a negotiation mechanism would then be to say that a player’s minimum utility 
for his optimal deal should at least match the minimal utility gained with other 
strategies and a player’s maximum utility for his optimal deal should at least 
match his maximum utility gained with other strategies. 

The definitions for Negotiation-Range weakly dominant strategy for the pri- 
mary market and the secondary market are given in the full paper [7] . 

The key notion of this paper is the use of negotiation as a way for buyers 
and sellers to determine their final price. If all conditions of the negotiation- 
range weakly dominant strategy always hold, in both the primary market and 
the secondary market, then one can set the buyer and dummy seller’s final 
price to the average of the ceiling and the floor in the primary market and the 
mechanism would be incentive compatible in the traditional sense, i.e., it would 
not be a negotiation-range mechanism. A similar averaging can be applied in the 
secondary market. 
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The negotiation principle is required when some of the negotiation-range 
weakly dominant strategy conditions do not hold. In these cases it is claimed 
that the buyer can calculate the diverted ceiling Hj and the seller the diverted 
floor Li (the dominant seller the diverted floor Lt resp.). This allows a bidder to 
report his true value to the mechanism and use the diverted bound during the 
negotiation. 

Definition 9. A negotiation-range mechanism is incentive compatible if for ev- 
ery buyer j and seller i in the mechanism, (primary market or secondary market) 
truth-telling is their negotiation-range weakly dominant strategy or buyer j (resp. 
seller i) can compute Hj (resp. Li, or Lt) and use the computed value during 
the negotiation. 

Theorem 2. ZOPAC is an incentive compatible negotiation-range mechanism 
for buyers and dominant sellers in the primary market, and for sellers in their 
secondary market. 

The proof of theorem 2 is compounded by number of claims, each proving 
a minimum or maximum utility condition for the cases where the allocation 
changed or not as a result of the player lie. 

4 Coalition-Resistent Market 

This section shows another advantage of leaving the matched buyers and sellers 
with a negotiation range rather than a final price. The negotiation range also 
“protects” the mechanism from some manipulations. For simplicity this section 
will refer to a heterogeneous setting. The findings apply to ZOPAC’s setting and 
to ZOPAS [1]. All the proofs of this section are given in the full paper [7]. 

In a market there are numerous ways participants can collude to improve 
their gain. Buyers can collude, sellers can collude or a mixed coalition of buyers 
and sellers can work together. This section deals with coalitions of buyers. 

In a negotiation-range mechanism there are two values that a coalition can 
attempt to divert; the negotiation ceiling and floor. Since the floors are essentially 
VCG it is impossible, and known, that they cannot resist manipulation, i.e., it 
is possible for a loosing buyer to impact the floor of the seller matched with the 
winning buyer. However, it is impossible for any buyer to help another buyer by 
manipulating his ceiling. 

In section 3 it was shown that whenever the negotiation-range weakly domi- 
nant strategy does not hold, buyer j can calculate his diverted ceiling given the 
ceiling resulting from his truth-telling strategy. The basis of the coalition-proof 
property is that no other lie stated by the coalition members can lead to a lower 
ceiling than j’s calculated ceiling. In other words j can help himself more than 
any other buyer in the market, so no coalition of buyers can collude by bidding 
untruthfully so that some members of the coalition can improve their calcu- 
lated negotiation terms, i.e., their ceiling. The following definitions formalize the 
coalition-resistent property in negotiation-range markets. 
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Let Hz be the ceiling calculated by buyer z for the negotiation phase when 
the mechanism assigned the ceiling Hz to z. Let H^^^ be the ceiling given by the 
mechanism when buyer j lies in order to help buyer z. 

Definition 10. A negotiation-range market is ceiling coalition proof for buyers 
if: No coalition of buyers can collude by bidding untruthfully, S = {j\vj{T*) yf 
Vj(T*)}, so that some member of the coalition z € S improves his calculated 

ceiling utility, i.e., 'iz,Tf{Ag) = 1, Tf{Af) = 1, j € S, Vz{g) — Hz > Vz{f)—Hi^\ 
Theorem 3. ZOPAC and ZOPAS [1] are negotiation-range markets that are 
ceiling coalition proof for buyers. That is, for every buyer j and every buyer z 
such that j yf z, T*{Ai) = 1, Tf{Ag) = 1, Tf{Af) = 1 and for every lie buyer j 
can tell, Vj{i) yf Vj{i) or Vj{k) y^ Vj(k), z cannot improve his calculated ceiling 
utility. 

The proof of theorem 3 begins by showing that buyer j cannot help buyer z 
if the optimal allocation is not affected by j’s lie. 

Claim. Let j be a buyer matched to seller i and let z be a buyer matched to 
seller g in T* and in T* . Then, 

Hz<Hi^\ (1) 

The proof of the claim begins by showing that the amount by which buyer 
z can reduce his ceiling is more than the amount buyer j can change his own 
valuation when lying. The proof of the claim continues by showing that all 
possible allocations that pair buyer j with seller i imply (1). 

Let Hz = Hz- Vz(g) + Vz{g) = Vz{g) + (KT/)* - where Vz(g) is the 

value that z uses to compute his ceiling. Recall that = Vz{g) + {VIzY ~ 
(y~s)* where {TZz)* and are the allocations resulting from j’s lie. It 

follows that in order to prove (1) it is necessary to show 

(R_7)* + (V-r < Vz{g) - Vz{g) + {VZzr + (2) 

The proof of (2) is as follows. Given the allocations (T”®)* and (TZz)*, which to- 
gether have value equal to -\- (VZz)*, one can use the same pairs matched 

in these allocations to create a new pair of valid allocations, allocations. Conse- 
quently the sum of values of the new allocations and the sum of values of the 
original pair of allocations is the same. It is also required that one of the new 
allocations does not include seller g and is based on the true valuation, while 
the other allocation does not include buyer z nor seller g and is based on false 
valuation. This means that the sum of values of these new allocations is at most 
(R-3)* -I- (yj/)*, which proves (2). 

The proof of theorem 3’s is concluded by showing that buyer j cannot help 
buyer z even when the optimal allocation changes as a result of j’s lie. 

Claim. Let j be a buyer matched to seller i and let z be a buyer matched to 
seller g in T* , and let A: yf z and / be sellers matched with j and z respectively 
in T*. Then, 



vz{g) - Hz > vzif) - Hi^^ . 



(3) 
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5 Conclusions and Extensions 

This paper showed that negotiation-range mechanisms exhibit coalition resis- 
tance, and introduced a restricted combinatorial setting. 

The most general and desirable combinatorial market would allow each seller 
to sell a bundle of goods and allow buyers to buy bundles that are not necessarily 
the exact bundles for sale. This mechanism would require a complex negotiation 
scheme between the sellers. 

Coalition proof is an interesting property of the ZOPAS and ZOPAC bilateral 
trade mechanisms, but in the present embodiment the floors are not coalition 
proof in the same way that VCG mechanisms are not coalition proof. An inter- 
esting direction for future research is the question: Can one define the floor on a 
basis other than VCG such that the new floor would be coalition proof as well? 
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Abstract. Let G = {V, E) be an nndirected multi-graph with a special 
vertex root G V , and where each edge e G A is endowed with a length 
l{e) > 0 and a capacity c(e) > 0. For a path P that connects u and v, the 
transmission time of P is defined as t{P) — l{e)+maxeep For 

a spanning tree T, let P^,v be the unique u — v path in T. The QUICK- 
EST RADIUS SPANNING TREE PROBLEM is to find a spanning tree T of 
G such that max^^v t{P^oot,v) is minimized. In this paper we present 
a 2-approximation algorithm for this problem, and show that unless 
P = NP, there is no approximation algorithm with performance guar- 
antee of 2 — e for any e > 0. The QUICKEST diameter spanning tree 
PROBLEM is to find a spanning tree T of G such that max„,„gv 
is minimized. We present a | -approximation to this problem, and prove 
that unless P = NP there is no approximation algorithm with perfor- 
mance guarantee of | — e for any e > 0. 



1 Introduction 

Let G = {V,E) be an undirected multi-graph where each edge e £ E is endowed 
with a length /(e) > 0 and a capacity c(e) > 0. For an edge e = (u,v) G E, the 
capacity of e is the amount of data that can be sent from uto v along e in a single 
time unit. The reciprocal capacity along e is r(e) = The length of e is the time 
required to send data from utov along e. If we want to transmit a data of size cr 
along e, then the time it takes is the transmission time, t(e) = /(e) -I- For a 
path P that connects u and v the length of P is defined as 1{P) = 
reciprocal capacity of P is defined as r(P) = maxegp r(e), and the transmission 
time of P is defined as t{P) = 1{P) + ar{P). Denote n = \V\ and m = \E\. 

A path P that connects u and v is a u — v path. For a spanning tree T, let 
b® the unique u — v path in T. 

The QUICKEST PATH PROBLEM (QPP) is to find for a given pair of vertices 
u,v £ V & u — V path P, whose transmission time is minimized. Moore [8] in- 
troduced this problem and presented an 0{m? + mn log n)-time algorithm that 
solves the problem for all possible values of a. Chen and Chin [4] showed an 
0{m^ -b mnlogn)-time algorithm that solves the problem for a common source 
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u and all destinations v for all possible values of tr. Their algorithm creates a 
data structure that encodes the quickest paths for all possible values of a. Using 
this data-structure, it is possible to obtain an optimal path for a single value of 
(7 in O(logn) time (after the 0(m^ + mn log n)-time preprocessing). Martins and 
Santos [7] provided a different algorithm for the same problem with the same 
time-complexity and a reduced space complexity. Rao [10] considered the prob- 
lem with a single value of a and a given pair u, v. He designed a probabilistic 
algorithm with running time 0(pm + pnlogn) that has a high probability (that 
depends on p) of computing a path with transmission time close to the optimal 
one. The all-pairs shortest paths variant of this problem is solved in 0{mn^) 
time (after the 0{m? -|- mn log n)-time preprocessing) for all possible values of a 
and moreover, the algorithm constructs a data structure such that it is possible 
to get an optimal path for a single value of u, v and a in 0(log n) time. This 
algorithm to solve the all-pairs variant appeared in both Chen and Hung [1] and 
Lee and Papadopoulou [6]. Rosen, Sun and Xue [11] and Pelegrin and Fernan- 
dez [9] considered the bi-criteria minimization problem of finding a u — v path 
whose length and reciprocal capacity are both minimized. These objectives are 
sometimes contradicting, and [11,9] presented algorithms that compute the effi- 
cient path set. Chen [2,3] and Rosen, Sun and Xue [11] considered the k quickest 
simple path problem. 

In this paper we consider problems with a single value of a. W.l.o.g. we can 
assume that cr = 1 (otherwise we can scale c). Therefore, for a u — v path P, 
t{P) = 1{P) + r{P). 

Many network design problems restrict the solution network to be a tree 
(i.e., a subgraph without cycles). Usually such a constraint results from the 
communication protocols used in the network. In broadcast networks, one usually 
asks for a small maximum delay of a message sent by a root vertex to all the 
users (all the other vertices) of the network. In other communication networks, 
one seeks a small upper bound on the delay of transmitting a message between 
any pair of users (vertices). These objectives lead to the following two problems: 

The QUICKEST RADIUS SPANNING TREE PROBLEM is to find a Spanning tree 
r of G such that radt{T) = max„gy „) is minimized. 

The QUICKEST DIAMETER SPANNING TREE PROBLEM is tO find a Spanning 
tree T of G such that diamt{T) = maXu,v^vt{Pu v) is minimized. 

The shortest paths tree (SPT) is defined as follows: given an undirected multi- 
graph G = (U, E) with a special vertex root £ V, and where each edge e £ if is 
endowed with a length l{e) > 0, the SPT is a spanning tree T = (V,Et) such 
that for every u € V the unique root — u path in T has the same total length as 
the length of the shortest path in G that connects root and u. This tree can be 
found in polynomial time by applying Dijkstra’s algorithm from root. We denote 
the returned tree by SPT{G,root,l). 

To define the absolute 1-center of a graph G = (U, E) where each edge e G E 
is endowed with a length Z(e) > 0, we refer also to interior points on an edge by 
their distances (along the edge) from the two end-vertices of the edge. We let 
A{G) denote the continuum set of points on the edges of G. I induces a length 
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function over A{G). For € A{G), let lc{x,y) denote the length of a shortest 
path in A(G) connecting x and y. The absolute 1 -center is a minimizer x of the 
function F{x) = max^gy lc{x,v). 

The MINIMUM DIAMETER SPANNING TREE PROBLEM is defined aS follows: 
given an undirected multi-graph G = {V, E) where each edge is endowed with 
a length l{e) > 0, we want to find a spanning tree T = {V,Et) such that 
diamiiT) = maXu,v^vKPu v) is minimized. This problem is solved [5] in poly- 
nomial time by computing a shortest path tree from an absolute 1-center of G. 
Our results: 

1. A 2-approximation algorithm for the quickest radius spanning tree 

PROBLEM. 

2. For any e > 0, unless P = NP there is no approximation algorithm with 
performance guarantee of 2 — e for the QUICKEST radius spanning tree 
PROBLEM. 

3. A I -approximation algorithm for the QUICKEST diameter spanning tree 
PROBLEM. 

4. For any e > 0, unless P = NP there is no approximation algorithm with 
performance guarantee of | — e for the QUICKEST diameter spanning tree 
PROBLEM. 

2 A 2-Approximation Algorithm for the Quickest Radius 
Spanning Tree Problem 

Algorithm quicPradius: 

1. For every u € V \ {root}, compute QProot,u, that is a quickest path from 
root to u in G. 

2. Let G' = {V, E') = UuGU\{root} QProot,u- 

3. r = SPT{G',root,l). 

4. Return T' . 

Example 1. Consider the graph G = {{root, 1,2, 3}, Ei U E 2 ), where E\ = 
{{root, 1), (1,2)} and for each e € Ei l{e) = 1, r(e) = 2, and E 2 = {{root, 2), 
(2,3)1 and for each e G E 2 l{e) = 0 and r{e) = 3|. Then, QProot,i = {root, 1), 
QProot ,2 = {root, 1,2) and QProot ,3 = {root, 2, 3). Therefore, G' that is computed 
in Step 2 of Algorithm quick_radius is exactly G. In Step 3 we compute the tree 
T' = {{r oot, 1,2, 3}, Et') where Et> = {(root, 1), (root, 2), (2, 3)}, and therefore 
the cost ofT' is defined by the path P = {root, 2, 3) where r{P) = r{root, 2) = 3^, 
1{P) = 1 -|- 0 = 1, t{P) = r{P) -\-l{P) = 3^ -|- 1 = 4^. Note that T' is the optimal 
solution. 

The time complexity of Algorithm quick_radius is 0{mf -|- mnlogm). It is 
dominated by the time complexity of Step 1 that is 0{mf -\- mnlogm) by Chen 
and Chin [4]. The algorithm returns a spanning tree of G, and therefore it is 
feasible. 
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Before we prove the performance guarantee of the algorithm we remark that 
as in Example 1 G' may contain cycles. The reason for this is that the property 
of shortest paths, “any sub-path of a shortest path must itself be a shortest 
path” does not hold for quickest paths (as noted in [4]). 

Theorem 1. Algorithm quickjradius is a 2- approximation algorithm for the 
QUICKEST RADIUS SPANNING TREE PROBLEM. 

3 Inapproximability of the Quickest Radius Spanning 
Tree Problem 

In this section we show that unless P = NP, there is no approximation algorithm 
with a performance guarantee of 2 — e for any e > 0. In order to motivate our 
final reduction, we first present a simpler reduction that shows that there is 
no approximation algorithm with a performance guarantee of | — e. Next, we 
show an example where the fact that Algorithm quickjradius chooses its output 
from the edge set of the quickest paths, does not allow it to have a performance 
guarantee of better than 2. Finally, we present the main result of this section by 
combining the ideas of its first two parts. 

3.1 A I Lower Bound on the Approximation Ratio 

Lemma 1. If P ^ NP, then there is no — e)- approximation algorithm for 

the QUICKEST RADIUS SPANNING TREE PROBLEM for any e > 0. 

Proof. We prove the claim via reduction from SAT. Let Ci, C 2 , . . . , Cm be the 
clauses of a SAT instance in the variables xi,X 2 , ■ ■ ■ ,Xn- We construct a graph 
G = (V,E) as follows (see Figure 1): V = {root, leaf} U {xi,Xi '■ 1 < i < 
n} U {Cj : Cj is a clause}, i.e., we have special pair of vertices root and leaf, 
we have a vertex for each variable and another for its negation, and a vertex 
for each clause. E = EiU E 2 LI E 3 . E\ = {{root,xi),{root,xi)} U {{xi,Xi+\), 
{xi,xNA),{^ijXi+i),{Wi,xNN) : 1 < * < n - 1 } U {{xn, leaf), ftGi, leaf)}, where 
for every e € Ei, /(e) = 0 and r(e) = 1. E 2 = {{k,Cj) : for every literal k of 
Cj V}}, E^ = {(root,Xi), (root,Xi) : 1 < * < n}, where for every e € E 2 U E^, 
/(e) = 0.5 and r(e) = 0. This instance contains parallel edges, however it is easy 
to make it a graph by splitting edges. This defines the instance for the QUICKEST 
RADIUS SPANNING TREE PROBLEM. 

Assume that the formula is satisfied by some truth assignment. We now 
show how to construct a spanning tree T = (V,Et) such that radt{T) = 1. 
Et = Ef, U Ef.. Ef, is a path of edges from Ei that connects root and leaf 
where all its intermediate vertices are false literals in the truth assignment. Ef. 
is composed of the edges (root,li) from A 3 for all the true literals li, also each 
clause contains at least one true literal, and we add to E^ an edge between the 
clause vertex and such a true literal (exactly one edge for each clause vertex). 
Then, radt{T) = 1 because for every u G V that is on the path of E}, edges, we 
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Fig. 1. Example for the construction of Lemma 1 : Ci = a:i V 0:3 V *4 and C2 = a;2 V xa 



have a path whose transmission time is exactly 1, and for every other vertex we 
have a path from root of length at most 1 and reciprocal capacity 0, so that its 
transmission time is at most 1. 

We now show that if the formula cannot be satisfied, then for every tree 
T = (V,Et), radt{T) > If Pj^otieaf contains an edge from (at least 
one), then its length is at least its reciprocal capacity is 1, and therefore, its 
transmission time is at least |. Assume now that leaf composed only of 
edges from E\. Since the formula cannot be satisfied, then for every such path 
Proot leaf there is a clause vertex Cj all of whose literals are in For 

this Cj, Pjoot,Cj has length at least and its reciprocal capacity is 1. Therefore, 
its transmission time is at least |. 

For any e > 0, a (| — e)-approximation algorithm distinguishes between 1 
and |, and therefore solves the SAT instance. □ 



3.2 On Using Only Quickest Path Edges 

Recall that C is the union of a quickest root — u path in G for all u € V\{root}. 
Let e > 0 and denote by OPT the cost of an optimal solution. In this subsection 
we show an instance where any spanning tree T' of G' satisfies radt{T') > 
(2 - e)OPT. 

Let 5 > 0 be small enough, and let fc be a large enough integer. We construct 
a graph G = {V,E) as follows: V = {root, leaf} \J {i,i' ■. 1 < i < k — 1}. 
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E = E 1 UE 2 UE 3 UE 4 . El = {{root, l)}U{(i, *+l) : 1 < i < /c— 2}U{(fc— 1, leaf)} 
where for each edge e € Ei, /(e) = 0 and r(e) = 1. E 2 = {{root, 1)} U {(/, i + 1) : 
I < i < k — 2} where for each edge e € E 2 , /(e) = ^ and r(e) = 0. ii/3 = 
{(/,/') : 1 < i < fc — 1}, where for each e^ = (/,/'), l{^i) = and r{ei) = 0. 
E 4 = {{root,i') : 1 < z < fc — 1} where for each edge e € E 4 , /(e) = 1 + 5 and 
r(e) = 0. 

The quickest paths form root use only the edges from EiVJ E 2 ^ E^, and the 
subgraph G' for /c = 5 is shown in Figure 2. 



root 



O leaf 



6 6 6 6 

1' 2' 3' 4' 



an edge with 1 = 0, r = 1 
an edge with ^ ^ ^ r = 0 

an edge with r = 0 



Fig. 2. The graph G' for the example in Subsection 3.2 



We note that in this graph the tree T = {V,Et) where E^ = i?i U E 4 , 
has radt{T) = 1 + 5. Consider a spanning tree T' = {V,Et') where Et' Q 
E\\J E 2 G E^. Then, if is composed of all the edges of E 2 and the edge 

{k — l,leaf) € Ei, then radt{T') > 2 — Otherwise, Pjoot,ieaf contains other 
edges from Ei. Denote 0 = root, and let {i — l,i) € Pjoot leaf an edge from 
E\ such that i is minimized (z > 1). Then, P^g^f i, is composed of the edge (i, i') 
and the part of Pj^gi i^^f connects root and i. Therefore, its length is at 
least 1 — (by the minimality of i), and its reciprocal capacity is 1. Therefore, 
radt{T')>2-l. 

2 1 . 

We pick 5 and k such that > 2 — e. 

This example also shows that our analysis of Algorithm quick-radius is tight. 

3.3 A Tight Lower Bound on the Approximability Ratio 

In this subsection we show that unless P = NP there is no polynomial time 
approximation algorithm that guarantees a ratio of 2 — e for any e > 0. Since 
Algorithm quick-radius is a 2-approximation algorithm, it is the best possible. 

Theorem 2. If P NP, then there is no (2 — e)- approximation algorithm for 
the QUICKEST RADIUS SPANNING TREE PROBLEM for any e > 0. 

4 The Quickest Diameter Spanning Tree Problem 

In this section we consider the quickest diameter spanning tree problem. 
We present a |-approximation algorithm, and prove that unless P = NP this is 

the best possible. 
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Denote by OPT the cost of an optimal solution, Topt = (V,Eopt), to the 
QUICKEST DIAMETER SPANNING TREE PROBLEM. Denote by MDT{G, 1) the 
minimum diameter spanning tree of a graph G where the edges are endowed 
with a length function 1. 

Algorithm quicEdiameter. 

1. For every r € {r{e) : e € E} do 

a) Let Gr = (V,Er), where Er = {e G E ■. r(e) < r}. 

b) T^ = MDT{Gr,l). 

2. Let T' be a minimum cost solution among {T^}r^{r{e)-.e&E}- 

3. Return T' . 



Lemma 2. Let rmax = maxegE^pj r(e). For every pair of vertices u,v G V, 

2 ' ' max — ^ • 

T 

Proof. The claim clearly holds if the path Puff* contains an edge e with r(e) = 
rmax- Assume otherwise (that the claim does not hold), and let y G Puff* be 
such that there exists 2 : ^ Puff*, Pyff* contains an edge e with r(e) = rmax, 
and Puff* and Pyfz”* are edge-disjoint. Then, 20PT > t{Puff*) + t{pjff*) > 
l{Puff*) + l{Pvff*) + ‘2rmax = l{Puff*) + ‘2vmax, and the claim follows. □ 

For a tree T, denote by diami{T) = max„ vKpL)- 

Theorem 3. Algorithm quicEdiameter is a ^-approximation algorithm for the 
QUICKEST DIAMETER SPANNING TREE PROBLEM. 

Proof. Algorithm quick ^diameter is polynomial, and returns a feasible solution. 
It remains to show the approximation ratio of the algorithm. 

1. By the optimality of Tf^^^ (for the minimum diameter spanning tree prob- 
lem), diamifrf^^^) < diami{Topt). 

2. By Lemma 2, diamfTopt) < 20PT—2rmax- Therefore, by 1, diamfTf^^^) < 
20 PT — 2rmax- 

3. Since G i"{Prma.x'^ — ’’’max- 

4. By 2 and 3, diamtiTf^^^) = diami{Tf^^^) < 20PT — rmax- 

5. By 1, diami{Tf^^^) < diamfTopt) < OPT. 

6. By 5, diamt{Tf^^J < OPT -G rmax- 

7. By 4 and 6 diamt(Tf^^J) < ^OPT. Therefore, diamt{T') < ^OPT. 

□ 



Theorem 4. If P ^ NP, then there is no — e)- approximation algorithm for 
the QUICKEST DIAMETER SPANNING TREE PROBLEM for any £ > 0. 
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5 Discussion 

In this paper we consider a pair of new problems that consider the transmission 
time along paths instead of the usual length of the paths. The two problems 
that we consider are to minimize the diameter and the radius of a spanning 
tree with respect to the transmission time. There are numerous other graph 
problems of interest that ask to compute a minimum cost subgraph where the 
cost is a function of the distances among vertices on the subgraph. Defining these 
problems under the transmission time definition of length opens an interesting 
area for future research. 
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Abstract. We present a randomized — e) -approximation algo- 

rithm for the weighted maximum triangle packing problem, for any given 
e > 0. This is the first algorithm for this problem whose performance 
guarantee is better than | . The algorithm also improves the best known 
approximation bound for the maximum 2-edge path packing problem. 

Keywords: Analysis of algorithms, maximum triangle packing, 2-edge 
paths. 



1 Introduction 

Let G = (V,E) be a complete (undirected) graph with vertex set V such that 
\V\ = 3n, and edge set E. For e £ E let w(e) > 0 be its weight. For E' C E we 
denote w(E') = J^eeE' w{e). For a random subset E' C E, w{E') denotes the 
expected value. In this paper, a k-path is a simple fc-edge path, and similarly a 
k-cycle is a /c-edge simple cycle. A 3-cycle is also called a triangle. The maximum 
TRIANGLE PACKING PROBLEM is to Compute a Set of n vertex-disjoint triangles 
with maximum total edge weight. 

In this paper we propose a randomized algorithm which is the first approx- 
imation algorithm with performance guarantee strictly better than 0.5. Specifi- 
cally, our algorithm is an — e) -approximation for any given e > 0. 

The MAXIMUM 2-edge path packing problem requires to compute a max- 
imum weight set of vertex disjoint 2-edge paths. We improve the best known 
approximation bound for this problem and prove that our triangle packing al- 
gorithm is also a (|| — e) -approximation algorithm for this problem. 

2 Related Literature 

The problems of whether the vertex set of a graph can be covered by vertex 
disjoint 2-edge paths or vertex disjoint triangles are NP-complete (see [6] p. 76 
and 192, respectively). These results imply that the two problems defined in teh 
introduction are NP-hard. 

The problems considered in this paper are special cases of the 3-SET packing 
PROBLEM. In the unweighted version of this problem, a collection of sets of 
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cardinality at most 3 each, is given. The goal is to compute a maximum number 
of disjoint sets from this collection. In the weighted version, each set has a weight, 
and the goal is to find a sub-collection of disjoint sets having maximum total 
weight. We now survey the existing approximation results for the 3-SET packing 
PROBLEM, which of course also apply to the problems treated in this paper. 

Hurkens and Schrijver [11] proved that a natural local search algorithm can be 
used to give a (| — e) -approximation algorithm for unweighted 3-SET pack- 
ing for any e > 0 (see also [7]). Arkin and Hassin [1] analyzed the local search al- 
gorithm when applied to the weighted fc-SET packing problem. They proved 
that for fc = 3, the result is a (| — e)-approximation. Bafna, Narayanan, and 
Ravi [2] analyzed a restricted version of the local search algorithm in which 
the depth of the search is also k, for a more general problem of computing a 
maximum independent set in (k -I- l)-claw free graphs. For A: = 3 it gives a 
I -approximation). A more involved algorithm for k-set packing, that combines 
greedy and local search ideas was given by Chandra and Halldorsson [4]. This 
algorithm yields improved bounds for fc > 6. Finally, Berman [3] improved the 
approximation ratios for all fc > 4, and for the maximum independent set in 
(fc -I- l)-claw free graphs, however, the bound for fc = 3 is still 0.5. 

Kann [12] proved that the unweighted triangle packing problem is 
APX-complete even for graphs with maximum degree 4. Chlebfk and Chlebfkova 
[5] proved that it is NP-hard to obtain an approximation factor better than 
0.9929. 

The MAXIMUM TRAVELING SALESMAN PROBLEM (Max TSP) asks to Com- 
pute in an edge weighted graph a Hamiltonian cycle (or tour) of maximum 
weight. Max TSP is a relaxation of MAX k-EDGE path packing for any k. 
Therefore, an a-approximation algorithm for the former problem can be used 
to approximate the latter [10]: simply delete ever {k + l)-st edge from the TSP 
solution. Choose the starting point so that the deleted weight is at most 

the total weight. The result is an -approximation. By applying the ||- 

approximation randomized algorithm of Hassin and Rubinstein [9] for Max 
TSP, we obtain a (|^ — e) -approximation for k = 2. For the maximum 3-edge 
PATH PACKING PROBLEM there is a better | bound in [10]. 

In this paper we give the first algorithm whose performance guarantee is 
strictly greater than | for maximum triangle packing. Our bound is — e) 
for every e > 0. The algorithm is described in Section 3. We also improve the 
above-mentioned bound for maximum weighted 2-path packing. This result 
is described in Section 4. Specifically, our algorithm returns a (|| — e)- approx- 
imation for this problem. 



3 Maximum Weighted Triangle Packing 

A (perfect) binary 2-matching (also called 2-factor or cycle cover) is a subgraph 
in which every vertex in V has a degree of exactly 2. A maximum binary 2- 
matching is one with maximum total edge weight. Hartvigsen [8] showed how 
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to compute a maximum binary 2-matching in 0{v?) time (see [14] for another 
0{ri^\E\) algorithm). Note that a 2-matching consists of disjoint simple cycles 
of at least three edges each, that together span V. 

We denote the weight of an optimal solution by opt. 

Algorithm WTP is given in Figure 1. The algorithm starts by computing 
a maximum binary 2-matching. Long cycles, where jCj > (jCj denotes the 
number of vertices of C), are broken into paths with at most edges each, 
loosing a fraction of at most e of their weight. To simplify the exposition, the 
algorithm completes the paths into cycles to form a cycle cover C. The cycle 
cover C consists of vertex disjoint cycles Ci , . . . ,Cr satisfying 3 < jCij < e~^ + 1 
i = 1, . . . ,r. Since the MAXIMUM CYCLE COVER PROBLEM is a relaxation of the 
WEIGHTED MAXIMUM TRIANGLE PACKING PROBLEM, 

it(C) > (1 — e)opt. 

Algorithm WTP constructs three solutions and selects the best one. The first 
solution is attractive when a large fraction of opt comes from edges that belong 
to C. The second is attractive when the a large fraction of opt comes from edges 
whose two vertices are on the same cycle of C. The third solution deals with the 
remaining case, in which the optimal solution also uses a considerable weight of 
edges that connect distinct cycles of C. 



WTP 

input 

1. A complete undirected graph G = (V,E) with weights Wij {i,j) G E. 

2. A constant e > 0. 
returns A triangle packing. 

begin 

Compute a maximum cycle cover {C [, . . . , C)./}. 
for i = 1, . . . ,r' : 
if |C'| > £-1 

then 

Remove from C'i [elCd] edges of total weight at most , 

to form a set of paths of at most e~^ edges each. 

Close each of the paths into a cycle by adding an edge. 

end if 
end for 

C = {C'i, . . . , Cr} := the resulting cycle cover. 

TPi ~ A1{G,C). 

TP2 ~ A2(G,C). 

TPs ~ A3{G,C). 

return the solution with maximum weight among TPi,TP 2 and TPs. 
end WTP 



Fig. 1. Algorithm WTP 
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input 

1. A complete undirected graph G = (V,E) with weights Wij {i,j) £ E. 

2. A cycle cover C = {Ci, . . . , Cr}- 
returns A triangle packing TP\. 
begin 

TPi ;= 0. 
for i = 1 , . . . , r; 
if \Ci\ = 3 
then 

TPi := PPi U {Ci}. 



elseif \Ci\ =5 
then 

Let ei, . . . , 65 be the edges of Ci in cyclic order. 
Wi \= Wi A- Wi+i + luii+3 [indices taken mod 2J. 
Wj = ma,x{wi : i = 1, . . . , 5}. 

TPi ~TPiU{j,j + l,j + 2}. 

E' := P'u{i + 3,i + 4}. 



elseif \Ci\ ^ {3, 5} 

then 

Let 6i, . . . , 6c be the edges of Ci in cyclic order. 

Add to TPi triangles induced by adjacent pairs of edges 
from Ci, of total weight at least . 

Add to V' the vertices of Ci that are not incident to 
selected triangles. 
end if 



end for 



E" ;= 



\E’\ 



heaviest edges from E' . 

for every e € E" 

Add to TPi triangles formed from e and a third vertex either from 
E'\E" or from V'. 



return TPi. 
end A1 



Fig. 2. Algorithm A1 



The first solution is constructed by Algorithm A1 (see Figure 2). It selects 
all the 3-cycles of C. From every fc-cycle Ci with k ^ 3,5, the algorithm selects 
3-sets consisting of vertices of pairs of adjacent edges in Ci of total weight at 
least ^w{Ci), to form triangles in the solution. This task can be done by selecting 
an appropriate edge of the cycle, and then deleting this edge with one or two 
of its neighboring edges (depending on the value of \Ci \ mod 3), and then every 
third edge of the cycle according to an arbitrary orientation. 

5-cycles obtain special treatment (since the above process would only guar- 
antee I of their weight): From each 5-cycle we select three edges. Two of these 
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input 

1. A complete undirected graph G = (V,E) with weights Wij {i, 3) e E. 

2. A cycle cover C = {Ci, . . . , Cr}- 
returns A triangle packing TP 2 - 

begin 

Vi, ... ,Vr := the vertex sets of Ci, . . . ,Cr, respectively. 

for 7 = 0, . . . , n; 

F{G,j)=Q. 
end for 
for i = 1, . . . , r; 

F{i,0) ~ 0. 
for j = 1, . . . ,n.- 

W(i,j) := maximum weight of at most j disjoint 3-sets and 2-sets 
contained in Vi. 

end for 

F(i, j) =: max{W(i, k) + F{i — 1, j — k) : k < j}. 

end for 

P{r,n) ;= a collection of 2- sets and 3-sets that gives F(r,n). 
return TP 2 , a completion of P{r,n) into a triangle packing. 
end A2 



Fig. 3. Algorithm A2 



edges are adjacent and the third is disjoint from the two. The former edges define 
a triangle which is added to the solution. The third edge is added to a special 
set E' . The selection is made so that the weight of the adjacent pair plus half 
the weight of the third pair is maximized. After going through every cycle, a 
subset of E' of total weight at least ^w(E') is matched to unused vertices to 
form triangles. (Since \V\ = 3n, there should be a sufficient number of unused 
vertices.) Thus, in total we gain at least of of the cycle’s weight. 



The second solution is constructed by Algorithm A2 (see Figure 3). The 
algorithm enumerates all the possible packings of vertex disjoint 2-sets and 3-sets 
in the subgraph induced by the vertex set of each cycle Q € C. The weight of a 
subset is defined to be the total edge weight of the induced subgraph (a triangle 
or a single edge). A dynamic programming recursion is then used to compute 
the maximum weight of n subsets from the collection. In this formulation, we 
use W(i,j) to denote the maximum weight that can be obtained from at most j 
such subsets, subject to the condition that the vertex set of each subset must be 
fully contained in the vertex set of one of the cycles Ci, . . . ,Ci. After computing 
W{r,n), the solution that produced this value is completed in an arbitrary way 
to a triangle packing TP 2 . 

The third solution is constructed by Algorithm A3 (see Figure 5). It starts 
by deleting edges from C according to Procedure Delete described in Figure 4. 
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Delete 

input A set of cycles C — {Ci, . . . , Cr}- 
returns A set of paths V ■ 

begin 

for i = 1, . . . ,r : 

Randomly select an edge from Ci and mark it ei . 

Delete ei. 

Denote the edges of Ci in cyclic order according to an arbitrary 
orientation and starting at ei by e\, ... ,Cc, where c = \Ci\ = 4k + I 
and I £ {0, . . . , 3}. 

Delete from Ci the edges Cj such that j = 1 mod 4 and j < c — 3. 
itl = l 

then 

Delete Cc-i with probability 

elseif I — 2 
then 

Delete Cc-i with probability 

elseif I = 3 and c > 3 

then 

Delete Cc -2 with probability |. 

end if 

end for 

Denote the resulting path set by V . 

return V . 

end Delete 



Fig. 4. Procedure Delete 



The result is a collection V of subpaths of C such that the following lemma holds: 



Lemma 1. Consider a cycle Ci £ C. Let he the edge set deleted from Ci by 
Procedure Delete. Then: 

1 . 

2. The edges in are vertex disjoint. 

3. The expected size of is at least 

4-. The probability that any given vertex of Ci is adjacent to an edge in E^^ is at 
least 

Proof. 

1. The first property holds since an edge ei is deleted from the Ci. 

2. The second property holds by the way edges are selected for deletion. 

3. The expected value of \E]f\ is \\Ci\ for a triangle and \\Ci\ otherwise. 

4. If a vertex belongs to a 3-cycle in C then it is incident to ei with probability 
|. If the vertex is in a fc-cycle, k > 3, then since the expected size of 
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input 

1. A complete undirected graph G = {V,E), \V\ = 3n, with weights Wij (i,j) G E. 

2. A cycle cover C — {Ci, . . . , Cr}. 
returns A triangle packing TPs. 
begin 

Let E' be the edges of G with two ends in different cycles ofC. 

Compute a maximum weight matching M' C E' . 

V ;= Delete{C). 

M := {{i,j) G M' : i and j are end vertices of paths in P}. 

Vo M UP consists of paths Pf , , Pf and cycles G*,...,Gf such that 
each cycle contains at least two edges from M .Vo 
P* := {Pf,...,P:}. 
begin cycle canceling step: 
for i = 1, . . . , t; 

Randomly select an edge e G G* U M . 

Add C*\{e} toP*. 

end for 

end cycle canceling step 

Arbitrarily complete P* to a Hamiltonian cycle T. 

Delete n edges from T to obtain a collection Ps of 2-edge paths of total weight 
at least ^w{T). 

return TPs, a completion ofPs into a triangle packing. 
end A3 



Fig. 5. Algorithm A3 



\C ■ I 

is and these edges are vertex disjoint, the expected number of vertices 

incident to these edges is 

□ 

Consider a given cycle C G C with \C\ = 4/c + L If C is a triangle then 
exactly one edge is deleted by Procedure Delete. If ; = 0 then every fourth 
edge is deleted. If I G {1,2,3} and \C\ > 3 then the number of deleted edges 
may be fc or fc + 1, and in each case, their location relative to ei is uniquely 
determined. Let us define a binary random variable Xc such that the number of 
deleted edges in C is k-\- Xq- Xq uniquely determines the deletion pattern in C, 
specifying the spaces among deleted edges, but not their specific location. Since 
every edge has equal probability to be chosen as ei, the possible mappings of 
the deletion pattern into C have equal probabilities. Suppose that the deletion 
pattern is known. Every mapping of it into C specifies a subset S of vertices 
that are incident to deleted edges, and thus these vertices are the end vertices 
of the paths in V. We call these vertices free. 

Algorithm A3 computes a maximum matching M' over the set E' C E of 
edges with ends in distinct vertices. Only the subset M C M' of edges whose two 
ends become free in the deletion procedure is useful, and it is used to generate 
paths with the undeleted edges from C. However, cycles may also be generated 
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this way, and a cycle canceling step further deletes edges to cancel these cycles. 
Figure 6 illustrates this situation, where broken lines denote deleted edges and 
dotted lines denote edges that belong to M. The cycle C* consists of the vertices 
{v,w,a,b,c,d, f,u,g), and one of the edges (v,w),{b,c) and (/, u) that belong 
to M will be deleted to cancel C* . 

Consider an edge e = (w, w) € M. Denote by tt the probability that e is 
deleted in the cycle canceling step, given that its vertices v and w are free (i.e., 
v,w € S). 

Lemma 2. it < |. 

Proof. Suppose that v € C. For every u G C, define dist{u, v) to be the minimum 
(over the two possibilities) number of edges on a u—v subpath of C. (For example, 
dist{u,v) = 2 in Figure 6.) 

To prove that it < we define the following events: 






Fig. 6. A cycle in M U P 



Es,x'- Xc = x; v,w G S; V U M contains a, u — v path P such that e is on P, 
CnP = {tt, t;}, u G S, and dist{u, v) = S. (For example, P = (v, w, a, b, c, d, /, tt) 
and 5 = 2 in Figure 6.) 

Pc' w E^nd V belong to a common path in V. 

E]j: u and v belong to different paths in V. 

If e belongs to a cycle C* C V U M, then in the event Ec, \C* fl M\ > 2, 
whereas in the event Eq, \C* fl M\ > 4. Therefore the respective probabilities 
that e is deleted in the cycle canceling step are at most ^ and Using this 
observation, 
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Pr{Es^j,) l-Pr{Ec\Es,x) + -Pr{ED\Es,x) j ■ 

S^l x^O ^ ^ 

The proof is completed by Lemma 3 below. □ 

Let tt{S,x) = lPr{Ec\Es^x) + \Pr{ED\Es^x)- 
Lemma 3. For every 5 and x, 

7r(<5,a;) < 

Proof. The proof contains a detailed case analysis. It will be included in the full 
version of this paper. □ 



Theorem 1. max{w{TPi),w{TP 2 ),w{TP 3 )} > ^{1 — e)opt. 

Proof. Algorithm WTP first breaks long cycles loosing at most a fraction e of 
their weight, and obtains a cycle cover C. Thus, w{C) > (1 — e)opt. 

Let a denote the proportion of rc(C) contained in 3-edge cycles. The solution 
TPi contains all the triangles of C and at least half of the weight of any other 
cycle. Thus, 



w{TP,) >(^a+ w{C) = ^w{C) > ^(1 - e)opt. (1) 

Consider an optimal solution. Define Tint to be the edges of this solution 
whose end vertices are in the same connectivity component of C, and suppose 
that w(Tint) = fdopt. Then 



w{TP 2 ) > w{Tint) = fdopt. 



(2) 



Algorithm A3 computes a Hamiltonian path T, and then constructs a tri- 
angle packing of weight at least |w(r). T is built as follows: First | of the 
weight of any triangle, and | of any other cycle of C is deleted, leaving weight 
of [|a -I- |(1 — a)] w{C). Then edges from the matching M' are added. Origi- 
nally w{M') > ^-^opt. However only edges with two free ends are used, and by 
Lemma 1(4) their expected weight is at least \w{M') > ^^opt. Then cycles 
are broken, deleting at most | of the remaining weight (by Lemma 2). Hence 
the added weight is at least ^(1-/3). Altogether, 



w{T) > 



2 3., 

3«+ 

2 3., 

3 «+ 4(1-“) 



w{C) + ^(1 - 

(1 - e)opt+ j^(l - l3)opt. 



> 



412 



R. Hassin and S. Rubinstein 



From this, a solution is formed after deleting at most | of the weight. Hence, 

w('TP 3) > (l-e)opt. (3) 

It is now easy to prove that max{w{TPi),w{TP 2 ),w{TP 3 )} > ;^(1 — e)opt : If 
a < ^ then w{TPi) > ^(1 - e)opt, if /3 < ^ then w{TP 2 ) > ^(1 - e)opt, 
and if these conditions do not hold then w{TP'i) > ^(1 — e)opt. □ 

The time consuming parts of the algorithm are the computation of a maxi- 
mum 2-matching and the dynamic program in Algorithm A2. The first can be 
computed in time O(n^) as in [8]. Since the latter is executed on cycles with at 
most e~^ edges each, it also takes 0{n^) for any constant e. 

4 2-Path Packing 

Consider now the MAXIMUM 2-path packing problem. We apply Algorithm 
WTP, with two slight changes: one is that we do not complete 2-paths into 
triangles, and the other is that in Algorithm A1 we select from every triangle of 
C the two longest edges to form a 2-path. The analysis is identical, except for that 
the bound guaranteed by Algorithm A1 is only w{TPi) > [|a -I- ^(1 — a)] w{C). 
The resulting approximation bound is — e. 
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Abstract. Two equal length strings s and s', over alphabets Us and 
Egi, parameterize match if there exists a bijection n ■. Sg ^ Sgi , such 
that 7t(s) = s', where 7t(s) is the renaming of each character of s via 
7T. Approximate parameterized matching is the problem of finding for a 
pattern p, at each location of a text string t, a bijection tt that maximizes 
the number of characters that are mapped from p to the appropriate \p\- 
length substring of t. 

Our main result is an 0{nk^'^+mk log m) time algorithm for this problem 
where m = \p\ and n= |f|. 



1 Introduction 

In the traditional pattern matching model one seeks exact occurrences of a given 
pattern p in a text t, i.e. text locations where every text symbol is equal to its 
corresponding pattern symbol. For two equal length strings s and s' we say that 
s is a parameterized mateh, p-mateh for short, of s' if there exists a bijection tt 
from the alphabet of s to the alphabet of s' such that every symbol of s' is equal 
to the image under tt of the corresponding symbol of s. In the parameterized 
matching problem, introduced by Baker [6,8], one seeks text locations for which 
the pattern p p-matches the substring of length jpj beginning at that location. 

Baker introduced parameterized matching for applications that arise in software 
tools for analyzing source code. It turns out that the parameterized matching 
problem arises in other applications such as in image processing and computa- 
tional biology, see [2]. 

In [6,8] an optimal, linear time algorithm was given for p-matching. However, it 
was assumed that the alphabet was of constant size. Amir et.al. [4] presented 
tight bounds for p-matching in the presence of an unbounded size alphabet. 

In [7] a novel method was presented for parameterized matching by construct- 
ing parameterized sujjix trees, which also allows for online p-matching. The pa- 
rameterized suffix trees are constructed by converting the pattern string into a 
predecessor string. A predecessor string of a string s has at each location i the 
distance between i and the location containing the previous appearance of the 
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symbol. The first appearance of each symbol replaced with a 0. For example, the 
predecessor string of aabbaba is 0, 1, 0, 1, 3, 2, 2. A simple and well-known fact is 
that: 



Lemma 1. s and s' p-match iff they have the same predecessor string. 



The parameterized suffix tree was further explored by Kosaraju [15] and faster 
constructions were given by Cole and Hariharan [10]. 

One of the interesting problems in web searching is searching for color images, 
see [1,19,21]. The simplest possible case is searching for an icon in a screen, a 
task that the Human-Computer Interaction Lab at the University of Maryland 
was confronted with. If the colors are fixed, this is exact 2-dimensional pattern 
matching [3]. However, if the color maps in pattern and text differ, the exact 
matching algorithm would not find the pattern. Parameterized 2-dimensional 
search is precisely what is needed. A nearly optimal algorithm to solve the 2- 
dimensional parameterized matching problem was given in [2] . 

In reality, when searching for an image, one needs to take into account the occur- 
rence of errors that distort the image. Errors occur in various forms, depending 
upon the application. The two most classical distance metrics are the Hamming 
distance and the edit distance [18]. 

In [9] the parameterized match problem was considered in conjunction with 
the edit distance. Here the definition of edit distance was slightly modified so 
that the edit operations are defined to be insertion, deletion and parameterized 
replacements, i.e. the replacement of a substring with a string that p-matches it. 
An algorithm for finding the “p-edit distance” of two strings was devised whose 
efficiency is close to the efficiency of the algorithms for computing the classical 
edit distance. Also an algorithm was suggested for the decision variant of the 
problem where a parameter k is given and it is necessary to decide whether the 
strings are within (p-edit) distance k. 

However, it turns out that the operation of parameterized replacement relaxes 
the problem to an easier problem. A more rigid, but more realistic, definition 
for the Hamming distance variant was given in [5]. For a pair of equal length 
strings s and s' and a bijection tt defined on the alphabet of s, the 7r-mismatch 
is the Hamming distance between the image under tt of s and s' . The minimal 
TT-mismatch over all bijections tt is the approximate parameterized match. The 
problem considered in [5] is to find for each location f of a text t the approximate 
parameterized match of a pattern p with the substring beginning at location i. 
In [5] the problem was defined and linear time algorithms were given for the 
case were the pattern is binary or the text is binary. However, this solution does 
not carry over to larger alphabets. 

Unfortunately, under this definition the methods for classical string matching 
with errors for Hamming distance, e.g. [14,17], also known as pattern matching 
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with mismatches, seem to fail. For example the method in [14] uses a suffix tree 
and precomputed longest common ancestor (LCA) information. 

When attempting to apply this technique using a parameterized suffix tree, it 
fails. This fails because separate parameterized matches of substrings do not 
necessarily add up to a parameterized match. See the example below (substring 
1-4 and substring 6-9). Different tt’s are require for these p-matches. This also 
emphasizes why the definition of [9] is a simplification. 

p = ababaabaa... 

t = ... c d c d d e f e e ... 

In this paper we consider the problem of parameterized matching with k mis- 
matches. The parameterized matching problem with k mismatches seeks all p- 
matches of a pattern p in a text t, with at most k mismatches. 

For the case where |p| = |t| = m, which we call the string comparison problem 
we show that this problem is in a sense equivalent to the maximum matching 
problem on graphs. An algorithm for the problem follows, by using 

maximum matching algorithms. A matching ’Tower bound” follows, in the sense 
that an o(m^ ®) algorithm implies better algorithms for maximum matching. For 
the string comparison problem with threshold fc, we show an 0{m -\- k^'^) time 
algorithm and a ’’lower bound” of j^^). 

The main result of the paper is an algorithm that solves the parameterized 
matching problem with k mismatches problem, given a pattern of length m 
and a text of length n, in 0{nk^'^ -\- mklogm) time. This immediately yields 
a 2-dimensional algorithm of time complexity 0{n^mk^'^ -\- m^klogm), where 
IpI = and \t\ = n^. 

Roadmap: In section 2 we give preliminaries and definitions of the problem. In 
section 3 we present an algorithm for the string comparison problem. We also 
reduce maximum matching to the string comparison problem. In section 4 we 
present an algorithm for the parameterized matching with k mismatches. 



2 Preliminaries and Definitions 

Given a string s = siS 2 -..Sn of length jsj = n over an alphabet As, and a bijection 
7T from As to some other alphabet A(, the image of s under tt is the string 
s' = 7t(s) = 7t(si) • ... • 7r(sn), that is obtained by applying tt to all characters 
of s. 

Given two equal-length strings w and u over alphabets Sm and £md a bi- 
jection 7T from Au, to the 7r-mismatch between w and u is Ham(Tr(w),u), 
where Ham{s,t) is the Hamming distance between s and t. We say that w pa- 
rameterized k-matches u if there exists a tt such that the 7r-mismatch < k. The 
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approximate parameterized match between w and u is the minimum 7r-mismatch 
(over all bijections tt). 

For given input text x, |a;| = n, and pattern y, |y| = m, the problem of 
approximate parameterized matching for y in x consists of computing the ap- 
proximate parameterized match between y and every (consecutive) substring of 
length m of x. Hence, approximate parameterized searching requires computing 
the 7T yielding minimum 7r-mismatch for y at each position of x. Of course, the 
best 7T is not necessarily the same at every position. The problem of parame- 
terized matching with k mismatches consists of computing for each location i of 
X whether y parameterized fc-matches the (consecutive) substring of length m 
beginning at location i. Sometimes tt itself is also desired (however, here any tt 
with TT-mismatch of no more than k will be satisfactory) . 



3 String Comparison Problem 

We begin by evaluating two equal-length strings for a parameterized fc-match as 
follows: 

~ Input: Two strings, s = si, S2 , Sm and s' = s^, S2, ..., s'^, and an integer k. 
— Output: True, if there exists tt such that the 7r-mismatch of s and t \s <k. 

The naive method to solve this problem is to check the 7r-mismatch for every 
possible 7T. However, this takes exponential time. 

A slightly more clever way to solve the problem is to reduce the problem to a 
maximal bipartite weighted matching in a graph. We construct a bipartite graph 
B = (C/ U P, A) in the following way. U = Eg and V = Eg'. There is an edge 
between a £ Eg and b € Egi iff there is at least one a in s that is aligned with a 
b in s'. The weight of this edge is the number of a’s in s that are aligned to b’s 
in s'. It is easy to see that: 

Lemma 2 . A maximal weighted matching in B corresponds to a minimal tt- 
mismatch, where tt is defined by the edges of the matching. 

The problem of maximum bipartite matching has been widely studied and effi- 
cient algorithms have been devised, e.g. [11,12,13,20]. For integer weights, where 
the largest weight is bounded by jPj, the fastest algorithm runs in time 0{E\/V) 
[20]. This solution yields an 0(m^'^) time algorithm for the problem of param- 
eterized string comparison with mismatches. The advantage of the algorithm is 
that it views the whole string at once and efficiently finds the best function. In 
general, for the parameterized pattern matching problem with inputs t of length 
n, and p of length m, the bipartite matching solution yields an 0{nm}'^) time 
algorithm. 
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3.1 Reducing Maximum Matching to the String Comparison 
Problem 

Lemma 3. Let B = {U UV,E) be a bipartite graph. If the string eomparison 
problem ean be solved in 0{f{m)) time then a maximum matehing in B ean be 
found in 0{f{\E\) time. 

Since it is always true that \E\ < |Pp it follows that Hence, 

it would be surprising to solve the string comparison problem in In 

fact, even for graphs with a linear number of edges the best known algorithm is 
0{E\/V) and hence an algorithm for the string comparison problem in 
would have implications on the maximum matching problem as well. 

Note that for the string comparison problem one can reduce the approximate pa- 
rameterized matching version to the parameterized matching with k mismatches 
version. This is done by binary searching on k to find the optimal solution for the 
string comparison problem in the approximate parameterized matching version. 
Hence an algorithm for detecting whether the string comparison problem has a 
bijection with fewer than k mismatches with time + m) would imply a 

faster maximum matching algorithm. 



3.2 Mismatch Pairs and Parameterized Properties 

Obviously, one may use the solution of approximate parameterized string match- 
ing to solve the problem of parameterized matching with k mismatches. However, 
it is desirable to construct an algorithm with running time dependant on k rather 
than TO. The bipartite matching method does not seem to give insight into how 
to achieve this goal. In order to give an algorithm dependent on k we introduce a 
new method for detecting whether two equal-length strings have a parameterized 
fc-match. The ideas used in this new comparison algorithm will be used in the 
following section to yield an efficient solution for the problem of parameterized 
matching with k mismatches. 

When faced with the problem of p-matching with mismatches, the initial dif- 
ficulty is to decide, for a given location, whether it is a match or a mismatch. 
Simple comparisons do not suffice, since any symbol can match (or mismatch) 
every other symbol. In the bipartite solution, this difficulty is overcome by view- 
ing the entire string at once. A bijection is found, over all characters, excluding 
at most k locations. Thus, it is obvious that the locations that are excluded from 
the matching are “mismatched” locations. For our purposes, we would like to 
be able to decide locally, i.e. before tt is ascertained, whether a given location is 
“good” or “bad.” 

Definition 1 (mismatch pair). Let s and s' be two equal-length strings then 
a mismatch pair between s and s' is a pair of locations ( i,j ) such that one of the 
following holds, 
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1. Si = Sj and s' ^ s' 2. Si ^ sj and s' = s'. 

It immediately follows from the definition that: 

Lemma 4. Given two equal-length strings, s and s', if {i,j) is a mismatch pair 
between s and s' , then for every hijection II : Sg ^ Sgi either location i or 
location j , or both locations i and j are mismatches, i.e. 7r(si) yf s' orir^Sj) yf s' 
or both are unequal. 

Conversely, 

Lemma 5. Let s and s' be two equal-length strings and let S C m} be a 

set of locations which does not contain any mismatch pair. Then there exists a 
bijection ir : Sg ^ Sgi that is parameterized on S, i.e. for every i € S, 7r(si) = s'. 

The idea of our algorithm is to count the mismatch pairs between s and s'. Since 
each pair contributes at least one error in the p-match between s and s' it follows 
immediately from Lemma 4 that if there are more than k mismatch pairs then s 
does not parameterize fc-match s'. We claim that if there are fewer than /c/2 + 1 
mismatch pairs then s parameterize fc-matches s'. 

Definition 2. Given two equal-length strings, s and s' , a collection L of mis- 
match pairs is said to be a maximal disjoint collection if (1) all mismatch pairs in 
L are disjoint, i.e. do not share a common location, and (2) there is no mismatch 
pair that can be added to L without violating disjointness. 

Corollary 1. Let s and s' be two strings, and let L be a maximal disjoint col- 
lection of mismatch pairs of s and s'. If \L\ > k, then for every LI \ Eg ^ E'g, 
the n -mismatch is greater than k. If \L\ < k/2 then there exists II : Eg ^ E'g 
such that the II -mismatch counter is less than or equal to k. 

Proof. Combining Lemma 4 and Lemma 5 yields the proof. □ 



3.3 String Comparison Algorithm 

The method uses Corollary 1. First the mismatch pairs need to be found. In 
fact, by Corollary 1 only A: + 1 of them need to be found since fc + 1 mismatch 
pairs implies that there is no parameterized fc-match. After the mismatch pairs 
are found if the number of mismatch pairs mp is less than k/2 or more than fc 
we can announce a mismatch or match immediately according to Corollary 1. 
The difficult case to detect is whether there indeed is a parameterized fc-match 
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for k/2 < mp < k. This is done with the bipartite matching algorithm. However, 
here the bipartite graph needs to be constructed somewhat differently. While the 
case where mp < k/2 implies an immediate match, for simplicity of presentation 
we will not differentiate between the case of mp < k/2 and k/2 < mp < k. Thus 
from here on we will only bother with the cases mp < k and mp > k. 



Find Mismatch Pairs. In order to find the mismatch pairs of equal-length 
strings s and s' we use a simple stack scheme, one stack for each character in 
the alphabet. First mismatch pairs are searched for according to the first rule 
of definition 1, i.e. Si = Sj but s' yf s'. Then within the leftover locations 
mismatch pairs are seeked that satisfy the second rule of definition 1, i.e. Si sy 
but s' = s' . 

Algorithm: Find mismatch pairs(s,s') 

1. Construct a stack S^, for each a G Us- 

2. For i 1 to m do 

if empty(t9,,) then push(5,., < .s^, * >) 

else if top(i?,,^) <.s^,*> then push(5g;, <s',?:>) 

else < s'*,j ><— pop(5gj): add (i,j) to the mismatch pairs 

3. Set L to be all locations of {1, that do not appear in the 

mismatch pairs. 

4. Reverse the roles of s and .s' and do step 2 while scanning items 
only from L 

(i e here step 2 begins - " For j £ L do" ) 



Time Complexity: The algorithm that finds all mismatch pairs between two 
strings of length m runs in 0(m) time. Each character in the string is pushed / 
popped onto at most one stack. 



Verification. The verification is performed only when the number of mismatch 
pairs that were found, mp, satisfies mp < k. Verification consists of a procedure 
that finds the minimal 7r-mismatch over all bijections tt. The technique used is 
similar to the bipartite matching algorithm discussed in the beginning of the 
section. 

Let L C {1, ..., to} be the locations that appear in a mismatch pair. We say that 
a symbol a £ Sg mismatched if there is a location i £ L such that Si = a. 
Likewise, b £ Sg' is mismatched if there is a location i £ L such that s'^ = b. A 
symbol a £ Sg, or b £ Sg/, is free if it is not mismatched. 

Construct a bipartite graph B = {U U V,E) defined as follows. U contains all 
mismatched symbols a £ Eg and V contains all mismatched symbols b £ Eg> . 
Moreover, U contains all free symbols a £ Eg for which there is a location 
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i € {1, — L such that Si = a and s' is a mismatched symbol. Likewise, V 

contains all free symbols b G Us' for which there is a location i G {1, m} — L 
such that s[ = b and Si is a mismatched symbol. The edges, as before, have 
weights that correspond to the number of locations where they are aligned with 
each other. 

Lemma 6. The bipartite graph B is of size 0(k). Moreover, given the mismatch 
pairs it can be constructed in 0{k) time. 

While the lemma states that the bipartite graph is of size 0{k) it still may be the 
case that the edge weights may be substantially larger. However, if there exists 
an edge e = (a, b) with weight > k then it must be that 7r(a) = b for otherwise 
we immediately have > k mismatches. Thus, before computing the maximum 
matching, we drop every edge with weight > k, along with their vertices. What 
remains is a bipartite graph with edge weights between 1 and k. 

Theorem 1. Given two equal-length strings s and s' , with mp < k mismatch 
pairs. It is possible to verify whether there is a parameterized k-match between s 
and s' in 0{k^'^) time. 

Time Complexity: Given two equal-length strings s and s' , it is possible to 
determine whether s parameterized fc-matches s' in 0{m -|- time, 0{m) to 
find the mismatch pairs and 0{k^'^) to check these pairs with the appropriate 
bipartite matching. 



4 The Algorithm 



We are now ready to introduce our algorithm for the problem of parameterized 
pattern matching with k mismatches: 

— Input: Two strings, t = t\,t 2 , ..., t^ and p = pi,p 2 , ...jPm, and an integer k. 

— Output: All locations i \r\ t where p parameterized fc-matches ti, . . . 

Our algorithm has two phases, the pattern preprocessing phase and the text 
scanning phase. In this section we present the text scanning phase and will 
assume that the pattern preprocessing phase is given. In the following section 
we describe an efficient method to preprocess the pattern for the needs of the 
text scanning phase. 

Defiuitiou 3. Let s and s' be two-equal length strings. Let L be a collection of 
disjoint mismatch pairs between s and s' . If L is maximal and \L\ < k then L is 
said to be a fc-good witness for s and s' . If \L\ > k then L is said to be a fc-bad 
witness for s and s' . 
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The text scanning phase has two stages the (a) filter stage and the (b) verification 
stage. In the filter stage for each text location i we find either a k-good witness 
or a fc-bad witness for p and U . . .ti+m-i- Obviously, by Corollary 1, if there is 
a fc-bad witness then p cannot parameterize fc-match ti . . Hence, after 

the filter stage, it remains to verify for those locations i which have a A:-good 
witness whether p parameterize fc-matches ti . . . ti+m-i- The verification stage is 
identical to the verification procedure in Section 3.3 and hence we will not dwell 
on this stage. 



4.1 The Filter Stage 

The underlying idea of the filter stage is similar to [16]. One location after an- 
other is evaluated utilizing the knowledge accumulated at the previous locations 
combined with the information gathered in the pattern preprocessing stage. The 
information of the pattern preprocessing stage is: 

Output of pattern preprocessing: For each location i of p, a maximal disjoint 
collection of mismatch pairs L for some pattern prefix pi, . . . ,pj_i+i and Pi,. . ■ ,Pj 
such that either |L| = 3k + 3 or j = m and |L| < 3fc -F 3. 

As the first step of the algorithm we consider the first text location, i.e. when 
text t\. . .tra is aligned with pi . . .pm- Using the method in Section 3.3 we can 
find all the mismatch pairs in 0(m) time, and hence find a fc-good or fc-bad 
witness for the first text location. When evaluating subsequent locations i we 
maintain the following invariant: For all locations I < i we have either a k-good 
or k-bad witness for location 1. 

It is important to observe that, when evaluating location i of the text, if we 
discover a maximal disjoint collection of mismatch pairs of size fc -F 1, i.e. a k- 
bad witness, between a prefix pi, ...,Pj of the pattern and ti, . . . , ti+j-\ we will 
stop our search, since this immediately implies that Pi, . . . ,Pm and ti, . . . , ti+m-i 
have a /c-bad witness. We say that i-\-j is a stop location for i. If we have a fc-good 
witness at location i then i -F m is its stop location. When evaluating location 
i of the text, each of the previous locations has a stop location. The location i 
which has a maximal stop location, over all locations I < i, is called a maximal 
stopper for i. 

The reason for working with the maximal stopper for i is that the text beyond 
the maximal stopper has not yet been scanned. If we can show that the time 
to compute a fc-bad witness or find a maximal disjoint collection of mismatch 
pairs for location i up until the maximal stopper, i.e. for pi, . . . ,pf/_j_|_i with 
ti, . . . ,ti>, is 0{k) then we will spend overall 0{n -F nk) time for computing, 
0{nk) to compute up until the maximal stopper for each location and 0{n) for 
the scanning and updating the current maximal collection of mismatch pairs. 
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Up to the Maximal Stopper. The situation now is that we are evaluating 
a location i of t. Let i be the maximal stopper for %. We utilize two pieces of 
precomputed information; (1) the pattern preprocessing for location i — £ + 1 
of p and (2) the fc-good witness or fc-bad witness (of size fc + 1) for location 
£ of t. Let £' be the stop location of £. We would like to evaluate the overlap 
of pi, . . . ,pf'_i+i with ti, . . . , til and to find a maximal disjoint collection L of 
mismatch pairs on this overlap, or, if it exists, a maximal disjoint collection L 
of mismatch pairs of size fc + 1 on a prefix of the overlap. 

There are two cases. One possibility is that the pattern preprocessing returns 
3fc + 3 mismatch pairs for a prefix pi, . . . ,pj and p^_i_|_i, . . . where 

j < i' — i + 1. Yet, there are < fc + 1 mismatch pairs between pi , . . . , p^/_^+i and 

ti,. . . , til. 

Lemma 7. Let s, s' and s" he three equal-length strings such that there is a 
maximal collection of mismatch pairs Lg gi between s and s' of size < M and 
a maximal collection of mismatch pairs Lgi^s" between s' and s" of size > 3M. 
Then there must be a maximal collection of mismatch pairs Ls^s" between s and 
s" of size > M. 

Set M to be fc + 1 and one can use the Lemma for the case at hand, namely, 
there must be at least fc+1 mismatch pairs between pi, . . . ,p£/_i+i and ti, . . . ,tii, 
which defines a fc-bad witness. 

The second case is where the pattern preprocessing returns fewer than 3fc + 3 
mismatch pairs or it returns 3fc + 3 mismatch pairs but it does so for a prefix 
Pi, . . . ,Pj and pi-i^i, . . . ,pi-i+j+i where j > £' — i + 1. However, we still have 
an upper bound of fc + 1 mismatch pairs between pi, . . . ,p^/_f+i and ti, . . . , tii. 
So, we can utilize the fact that: 

Lemma 8. Given three strings, s, s' , s" such that there are maximal disjoint 
collections of mismatch pairs, Ls,s' cind Lgi^s" of sizes 0{k). In 0{k) time one 
can find a k-good or k-had witness for s and s" . 



Putting it Together. The filter stage takes 0{nk) time and the verification 
stage takes 0{nk^-^). Hence, 

Theorem 2. Given the preprocessing stage we can announce for each location 
i of t whether p parameterized k-matches £j, ..., £^+^-1 w 0{nk^'^) time. 

5 Pattern Preprocessing 

In this section we solve the pattern preprocessing necessary for the general al- 
gorithm. 
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— Input: A pattern p = pi, . . . ,pm and an integer k. 

— Output: For each location i, 1 < i < m a maximal disjoint collection of 

mismatch pairs L for the minimal pattern prefix pi, . . . and Pi, . . . ,Pj 

such that either \L\ = 3k + 3 or j = m and \L\ < 3A: + 3. 

The naive method takes 0{w?)^ by applying the method from Section 3.3. How- 
ever, this does not exploit any previous information for a new iteration. Since 
every alignment combines two previous alignments, we can get a better result. 
Assume, w.l.o.g., that m is a power of 3. We divide the set of alignments into 
logg m + 1 sequences as follows; 

= [2,3],i?3 = [4, 5, 6, 7, 8, 9],.. .,i?, = [3*-i + 1, ..., 3*], ..., i?iog3^ = 
jgloga m -1 31033 1 < * < logg m. 

At each step, we compute the desired mismatches for each set of alignments Ri. 
Step i uses the information computed in steps We further divide each set 

into two halves, and compute the first half followed by the second half. This is 
possible since each Ri contains an even number of elements, as 3*— 3*“^ = 2*3*“^ . 
We split each sequence Ri into two equal length sequences i?^[3*“^-|-l, ..., 2*3*“^] 
and = [2 * 3®“^ -F 1, . . . , 3*]. For each of the new sequences we have that: 

Leuuua 9. If ri,r 2 G R\ (the first half of set Ri) or ri,r 2 G R( (the seeond 

half of Ri) sueh that ri < T 2 then r 2 — ri G Rj for j < i. 

This gives a handle on how to compute our desired output. Denote fi = min{j G 
Rj} and rm = min{j G i?f} the representatives of their sequences. We compute 
the following two stages for each group Ri, in order Ri, , i?iog 3 m, 

1. Compute a maximal disjoint collection of mismatch pairs L for some pattern 

prefix pi,.. . ,Pj+i and Pfi,. ■ . ,Pfi+j such that \L\ = {3k + 3)3'°®3 or 

fi+ j = 'm and \L\ < {3k + 3)3'°®3 Do the same with 

2. Now for each j G R} apply the algorithm for the text described in the previ- 
ous section on the pattern p, the pattern shifted by fi {Rj's representative) 
and the pattern shifted by j. Do the same with rrii for R^. 

The central idea and importance behind the choice of the number of mismatch 
pairs that we seek is to satisfy Lemma 7. It can be verified that indeed our choice 
of sizes always satisfies that there are 3 times as many mismatch pairs as in the 
previous iteration. Therefore, 

Theorem 3. Given a pattern p of length m. It is possible to precompute its 
3/c -F 3 mismatch pairs at each alignment in 0(fcm logg m) time. 



Corollary 2. Given a pattern p and text t, we can solve the parameterized 
matching with k mismatches problem in 0{nk^'^ -F fcmlogm) time. 



Approximate Parameterized Matching 425 



References 

1. K. W. Church A. Amir and E. Dar. Separable attributes: a technique for solving 
the submatrices character count problem. In Proc. 13th ACM-SIAM Symp. on 
Discrete Algorithms (SODA), pages 400-401, 2002. 

2. A. Amir, Y. Aumann, R. Cole, M. Lewenstein, and E. Porat. Function matching: 
Algorithms, applications and a lower bound. In Proc. of the 30th International 
Colloquium on Automata, Languages and Programming (ICALP), pages 929-942, 
2003. 

3. A. Amir, G. Benson, and M. Farach. An alphabet independent approach to two 
dimensional pattern matching. SIAM J. Comp., 23(2):313-323, 1994. 

4. A. Amir, M. Farach, and S. Muthukrishnan. Alphabet dependence in parameter- 
ized matching. Information Processing Letters, 49:111-115, 1994. 

5. A. Apostolico, P. Erdos, and M. Lewenstein. Parameterized matching with mis- 
matches. manuseript. 

6. B. S. Baker. A theory of parameterized pattern matching: algorithms and appli- 
cations. In Proc. 25th Annual ACM Symposium on the Theory of Computation, 
pages 71-80, 1993. 

7. B. S. Baker. Parameterized string pattern matching. J. Comput. Systems Sci., 52, 
1996. 

8. B. S. Baker. Parameterized duplication in strings: Algorithms and an application 
to software maintenance. SIAM J. on Computing, 26, 1997. 

9. B. S. Baker. Parameterized diff. In Proc. 10th Annual ACM-SIAM Symposium on 
Discrete Algorithms, pages 854-855, 1999. 

10. R. Cole and R. Hariharan. Faster suffix tree construction with missing suffix links. 
In Proc. 32nd ACM STOC, pages 407-415, 2000. 

11. M.L. Fredman and R.E. Tarjan. Fibonacci heaps and their use in improved network 
optimizations algorithms. J. of the ACM, 34(3):596-615, 1987. 

12. H. N. Gabow. Scaling algorithms for network problems. J. of Computer and System 
Sciences, 31:148-168, 1985. 

13. H. N. Gabow and R.E. Tarjan. Faster scaling algorithms for network problems. 
SIAM J. on Computing, 18:1013-1036, 1989. 

14. Z. Gain and R. Giancarlo. Improved string matching with k mismatches. SICACT 
News, 17(4):52-54, 1986. 

15. S. R. Kosaraju. Faster algorithms for the construction of parameterized suffix 
trees. Proc. 36th IEEE FOCS, pages 631-637, 1995. 

16. G.M. Landau and U. Vishkin. Introducing efficient parallelism into approximate 
string matching. Proc. 18th ACM Symposium on Theory of Computing, pages 
220-230, 1986. 

17. G.M. Landau and U. Vishkin. Fast string matching with k differences. J. Comput. 
Syst. Sci., 37, 1988. 

18. V. I. Levenshtein. Binary codes capable of correcting, deletions, insertions and 
reversals. Soviet Phys. Dokl., 10:707-710, 1966. 

19. B.M. Mehtre G.P. Babu and M.S. Kankanhalli. Color indexing for efficient image 
retrieval. Multimedia Tools and Applications, 1(4), 1995. 

20. W.K. Sung M.Y. Kao, T.W. Lam and H.F. Ting. A decomposition theorem for 
maximum weight bipartite matching. SIAM J. on Computing, 31(l):18-26, 2001. 

21. M. Swain and D. Ballard. Color indexing. International Journal of Computer 
Vision, 7(l):ll-32, 1991. 



Approximation of Rectangle Stabbing and 
Interval Stabbing Problems 



Sofia Kovaleva^ and Frits C.R. Spieksma^ 

^ Department of Quantitative Economics, Maastricht University, P.O. Box 616, 
NL-6200 MD Maastricht, The Netherlands, 
sonja.kovalevaOmail . com, 

^ Department of Applied Economics, Katholieke Universiteit Leuven, Naamsestraat 

69, B-3000, Leuven, Belgium, 
frits . spieksma@econ.kuleuven.ac .be 



1 Introduction 

The weighted rectangle multi-stabbing problem (WRMS) can be described as 
follows: given is a grid in IR^ consisting of columns and rows each having a 
positive integral weight, and a set of closed axis-parallel rectangles each having 
a positive integral demand. The rectangles are placed arbitrarily in the grid with 
the only assumption that each rectangle is intersected by at least one column 
and at least one row. The objective is to find a minimum weight (multi)set of 
columns and rows of the grid so that for each rectangle the total multiplicity of 
selected columns and rows stabbing it is at least its demand. (A column or row 
is said to stab a rectangle if it intersects it.) 

We present approximation algorithms for two variants of WRMS (see e.g. Vazi- 
rani [6] for an overview on approximation algorithms). First, we describe a 
2^— approximation algorithm called ROUND for the case where the demand 
of each rectangle is bounded from below by an integer q. Observe that this pro- 
vides a 2-approximation algorithm for the WRMS described above, where q = 1- 
Second, we present a deterministic Ri 1.582— approximation algorithm called 

STAB for the case when each rectangle in the input is intersected by exactly one 
row (this special case is referred to as WIS). Our algorithms are based on round- 
ing the linear programming relaxation of an integer programming formulation 
in an interesting way. We use the following property present in our formulation: 
the variables can be partitioned into two sets such that when given the values 
of one set of variables, one can compute in polynomial time the optimal values 
of the variables of the other set of variables, and vice versa. Next, we consider 
different ways of rounding one set of variables, and compute each time the values 
of the remaining variables, while keeping the best solution. 

Motivation. Although at first sight WRMS may seem rather specific, it com- 
prises a quite natural combinatorial optimization problem: given a set of arbi- 
trary connected figures in IR^, stab the figures with the minimum number of 
straight lines of at most two different directions. Indeed, suppose we are given 
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n connected point sets (the figures) in F”, and two directions, 

specified by vectors vi and V 2 , orthogonal to them. Denote: a* = min{(r;i, i;)| € 

F*}, F =max((i;i,i;)| z; € F^, c* = min((z; 2 , z;)| z; € F*), F = max((z; 2 , z;)| z; € 
F*), Vz = [1 : n]. Now the problem can be transformed into the problem of 
stabbing n closed axis-parallel rectangles {{x,y) G 1R^| < x < b'',d < y < 

d*}, Vz = [1 : rz] with a minimum number of axis-parallel lines. By defining a 
grid whose rows and columns are the horizontal and vertical lines going through 
the edges of the rectangles, and by observing that any vertical (horizontal) line 
intersecting a set of rectangles can be replaced by a vertical (horizontal) line 
intersecting this same set of rectangles while going through an edge of some 
rectangle, it follows that stabbing arbitrary connected figures in the plane using 
straight lines of at most two directions is a special case of WRMS. 

Notice also that integer linear programming problems of the following type can 
be reformulated as instances of WRMS: minimize{wx\ {B\C)x > b, x € 
where b G w G and B and C are both 0,1-matrices with consecutive 
’I’-s in the rows (a so-called interval matrix). Indeed, construct a grid which 
has a column for each column in B and a row for each column in C. For each 
row i of matrix B\C, draw a rectangle z such that it intersects only the columns 
and rows of the grid corresponding to the positions of ’I’-s in row z. Observe 
that this construction is possible since B and C have consecutive ’I’-s in the 
rows. To complete the construction, assign demand bi to each rectangle z and a 
corresponding weight Wj to each column and row of the grid. Let the decision 
variables x describe the multiplicities of the columns and rows of the grid. In this 
way we have obtained an instance of WRMS. In other words, integer program- 
ming problems where the columns of the constraint matrix A can be permuted 
such that A = {B\C) with B and C each being an interval matrix, is a special 
case of WRMS. 

Literature. WRMS has received attention in literature before. Gaur et al. [2] 
present a 2-approximation algorithm for WRMS with unit weights and demands, 
which admits a generalization to arbitrary weights and demands preserving the 
same approximation factor. Thus, our algorithm ROUND with the ratio of 
is an improvement for those instances where a lower bound on the rectangles’ 
demands q is larger than 1. Furthermore, Hassin and Megiddo [3] study a number 
of special cases of the problem of stabbing geometric figures in IR^ by a minimum 
number of straight lines. In particular, they present a 2-approximation algorithm 
for the task of stabbing connected figures of the same shape and size with hor- 
izontal and vertical lines. Moreover, they study the case which in our setting 
corresponds to WIS with unit weights and demands, where each rectangle in 
the input is intersected by exactly K columns, and give a 2 — ^-approximation 
algorithm for this problem. Our e/e — 1-approximation algorithm provides a 
better approximation ratio for F > 3 and does not impose any restrictions on 
the number of columns intersecting rectangles. Finally, the special case of WIS 
with unit weights and demands, where each rectangle is stabbed by at most two 
columns, is shown to be APX-hard in [5]. 
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Our results. Summarizing our results: 

— applying a similar, slightly refined approach, we generalize the results of 
Gaur et al. [2] to obtain a approximation algorithm called ROUND for 
the case where the demand of each rectangle is bounded from below by an 
integer q (Section 3), 

— we describe an ^^—approximation algorithm called STAB for WIS based 
on an original rounding idea (Section 4), 

— we state here without proof that the special case of WIS with unit weights 
where each rectangle is stabbed by at most k columns admits an 
-approximation algorithm. For the (difficult) case k = 2, this implies a 
I — approximation algorithm . 

For some of the proofs of the results in the next sections we refer to Kovaleva [4] . 

Applications. For the sake of conciseness we simply refer to Gaur et al. [2] for an 
application of WRMS related to image processing and numerical computations, 
and to Hassin and Megiddo [3] for military and medical applications. 

2 Preliminaries 

Let us formalize the definition of WRMS. Let the grid in the input consist of 
t columns and m rows, numbered consecutively from left to right and from 
bottom to top, with positive weight Wc (vr) attached to each column c (row r). 
Further, we are given n rectangles such that rectangle i has demand di S 2Zj^ 
and is specified by leftmost column rightmost column r^, top row ti and 
bottom row 6^. 

Let us give a natural ILP formulation of WRMS. In this paper we use notation 
[a : b] for the set of integers {a, a -I- The decision variables yc,Zr € 

^+, c € [1 : t], r G [1 : to], denote the multiplicities of column c and row r 
respectively. 



The linear programming relaxation is obtained when replacing the integrality 
constraints (3) by the nonnegativity constraints Zr,i/c > 0, Vr, c. 

For an instance I of WRMS and a vector b G we introduce two auxiliary 
ILP problems: 



Minimize Sc=i '^cVc + 

subject to Ere[b>:t4 + T.ce[h:n] Vc > d* Vi € [1 : n] 




( 1 ) 

(2) 

( 3 ) 



Zr,ycG^\ Vr,c. 



ipy{l, b): 



Minimize Y!c=i 

subject to J2c€iu-n] Vc > Vi G [1 : n] 



( 4 ) 



yc G Vc G [1 : t] 
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JVHT h\- Minimize T,T=iVrZr 

’ ■ subject to J2r£[bi-.ti] > h \/i G [1 : n] (5) 

Zr € ^+, Vc € [1 : m] 

Lemma 1. For any b G 7Z'^ , the LP-relaxation of each of the problems b) 

and IP^ (X, b) is integral. 

Proof. This follows from the unimodularity of the constraint matrix of (5) which 
is implied by the “consecutive one’s”-property (see e.g. Ahuja et al. [1]). □ 

Corollary 2. The optimum value of IP^ (I, b) (IP^{I,b)) is smaller than or 
equal to the value of any feasible solution to its LP-relaxation. 



Corollary 3. The problem IP^(I,b) (IP^(X,b)) can he solved in polynomial 
time. Its optimal solution coincides with that of its LP-relaxation. 

In fact, the special structure of IP^(I, &) (IP^(I, 6)) allows to solve it via a 
minimum-cost flow algorithm. 

Lemma 4. The problem IP^{I,b) (IP^{I,b)) can be solved in time 0{MCF 
{t,n-\-t)) (0{MCF{m,n -\- m))), where MCF{p,q) is the time needed to solve 
the minimum cost flow problem on a network with p nodes and q arcs. 

Proof. Consider the LP-relaxation of formulation IP^(I, b) and substitute vari- 
ables with new variables uo,...,Ut as yc = Uc — Uc-i, Vc € [1 : t\. Then it 
transforms into 

Minimize —wiUq -\- (u>i — u> 2 )u 2 + + {wt-i — Wt)ut-i -\- WtUt 

subject to Un — Uij-i > bi, VI G [1 : n] (6) 

Uc — Uc-i >0, Vc G [1 : t]. 

Let us denote the vector of objective coefficients, the vector of right-hand sides 
and the constraint matrix by w, 6 and C respectively, and the vector of vari- 
ables by u. Then (4) can be represented as {minimize wu\Cu > b}. Its dual 
is: {maximize bx\C^x = w,x > 0}. Observe that this is a minimum cost flow 
formulation with flow conservation constraints C"^x = w, since has exactly 
one T’ and one ’-1’ in each column. Given an optimal solution to the minimum 
cost flow problem, one can easily obtain the optimal dual solution ug, ..., Ut (see 
Ahuja et al. [1]), and thus optimal yi, ...,yt as well. □ 

3 A Approximation Algorithm for WRMSq 

Let WRMSg be a special case of WRMS, where di > q, Vi G [1 : n]. In this 
section we describe an algorithm ROUND . Given an instance of WRMSy, the 
algorithm works as follows: 
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1. solve the LP-relaxation of (l)-(3) for X and obtain its optimal solution (y'**, 

2. m ^ [2^ Ece[4:ri] for a-11 * £ [1 : «.]; 

3. solve IP^{X, a) and obtain y; 

4. bi ^ [2±1 X]ce[f,iUi] for all i e [1 : n]; 

5. solve fe) and obtain z\ 

6. return {y, z) 



Fig. 1. Algorithm ROUND . 



Theorem 1. Algorithm ROUND is a — approximation algorithm for the 

problem WRMSq. 

Proof. Let I be an instance of WRMSg. First, we show that the solution {y, 
z) returned by ROUND is feasible for I, i.e., satisfies constraints (2) and (3). 
Obviously, vectors y and z are integral, hence, constraint (3) is satisfied. For each 
i G [1 : n], consider the left-hand side of constraint (2) for (y, z). By construction, 
y and z are feasible to IP^(I, a) and IP^(I, &) respectively. Using this, and by 
construction of the vectors a and b: 

Vc>ai + b,= y'fl + z'f\ . (7) 

r^[hi\ti\ c^[li\ri] c^[li:ri\ r^[bi:ti\ 

Since [aj -b [/3J > [a -b /3J — 1, for any positive a and /3, and since z'** and 
satisfy constraint (2), the right-hand side of (7) is greater or equal to: 

L— ( E y'o+ E 4'’)j-i>L— + = 

where the last inequality holds since we have di > y, Vi € [1 : n] in X. Thus, 
inequality (2) holds for (y, z) and hence (y, z) constitute a feasible solution to 
the instance I of WRMSg. 

The approximation ratio of ROUND is obtained from the ratio between the 
value of the output solution (y, z) and the value of the optimal fractional solution 
(y'*’, z'*’), which is at most the optimum value of WRMSy. 

We claim that '^cVc < Sc=i Observe that is a feasible 

solution to the LP-relaxation of IP*' (I, a) with Oi = ScG[b-i-i] VcJ > ^ 

[1 : n]. Indeed, the constraints of (4) are clearly satisfied: X)ce[ii-n] — 

ScG[b-ri] VcJ, Vi e [1 : n]. Thus, vectors y and are respectively an 

optimal solution to W^{X^b^) and a feasible solution to its LP relaxation. Now 
Corollary 2 implies the claim. 

Similarly, ^rZr < This proves the ratio of between 

the value of solution y, z an^ solution y'^, z'”. 

The running time of the algorithm is determined by the time needed to solve 
the LP-relaxation of (l)-(3) {LP(n,t + m)) and the problems a) and 
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IP^(I, b), which can be solved in time MCF{m, m + n) and MCF{t, t + n) (see 
Lemma 4) . □ 

Remark 1. There exist instances of WRMSq where the ratio between the opti- 
mum value of the formulation (l)-(3) and the optimum value of its LP-relaxation 
is arbitrarily close to (see Kovaleva [4])- This suggests that it is unlikely to 
improve the approximation ratio of ROUND by another LP-rounding algorithm 
using the same formulation of WRMSq. 



4 An e/(e — 1)— Approximation Algorithm for WIS 

The weighted interval stabbing problem (WIS) is a special case of WRMS with 
unit rectangles’ demands (d^ = l,Vi € [1 : n]) and each rectangle in the input 
intersecting exactly one row of the grid. Hence, we can think of the rectangles 
as of line segments, or intervals, lying on the rows of the grid. Let interval i 
(Vi € [1 : n]) be specified by the indices of the leftmost and the rightmost 
columns intersecting it, U and respectively, and the index of its row p^. 

In this section we describe an algorithm STAB for WIS and show that it is 
a e/(e — 1)— approximation algorithm. Let us first adapt the ILP formulation 
(l)-(3) to WIS: 

Minimize + Y.T=i ^rZr (8) 

subject to Zp^ + Y.ce{ii-.ri) Vc>^ Vt G [I : n] (9) 

Zr,Vc^2Z+ Vr,c. (10) 

Informally, algorithm STAB can be described as follows: given an instance 
I of WIS, the algorithm solves the LP-relaxation of (8)-(10), and obtains an 
optimal fractional solution (j/'^, z'*’). Then it constructs m-bl candidate solutions 
and returns the best of them. Candidate solution number j (j = 0,...,m) is 
constructed as follows: set the largest j z- values in the fractional solution to 1, 
the rest to 0; this gives an integral vector z. Solve the problem IP^(1, 6) (5), 
where 6^ = 1 — Zp. , Vi € [1 : n], and obtain integral y. Now (y, z) is the jth 
candidate solution. A formal description of STAB is shown in Figure 2, where 
we use the following definition: value(y, z) = X)c=i ^c2/c + '^rZr- 



In order to formulate and prove the main result of this section, we first 
formulate the following lemma. 

Lemma 5. Suppose we are given numbers 1 > Ai > A 2 > ... > Am > 0, 
Vi = l,..,m, and Am+i = 0- Further, given are positive numbers ai, a 2 , ■■■, Om 
and Y . Then we have: 



3 

min ( y Or 



1 

1 - Z\j+i 






-(^^ OrAr -b Y). 

e — 1 z — t 

T — \ 



( 11 ) 



We give the proof of this lemma at the end of this section. 
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1. solve the LP-relaxation of (8)-(10), and obtain its optimal solution ( 2 ''’,?/''’); 

2. reindex the rows of the grid so that z'^ > z '2 > ... > 

3. V <— 00 ; 

4. for j — 0 to m 

for i — 1 to j Zi <— 1, 
for i = j + 1 to m t— 0. 

solve IP^(iT, b), where bi = 1 — Zp^, Vi G [1 : n], and obtain j/; 
if value{y, z) < V then V <r- value{y, z), y* y,z* z\ 

5. return [y* , 2 *). 



Fig. 2. Algorithm STAB 



Theorem 6. Algorithm STAB is a ef{e—l) ~ 1.582— approximation algorithm 
for WIS. 

Proof. Obviously, algorithm STAB is a polynomial-time algorithm (see Corol- 
lary 3). It is also easy to verify that it always returns a feasible solution. Let us 
establish the approximation ratio. 

Consider an instance X of WIS, and let (y'”, z'^) and z^) be respectively 
an optimal solution to the LP-relaxation of (8)-(10), and the solution returned 
by the algorithm for X. We show that their values are related as follows: 

valueiy^.z^) < — ^ — valueiy^^ , z'^A . ( 12 ) 

e — 1 



Since value{y'^, z'*’) is at most the optimal value of WIS, the theorem follows. 

Assume, without loss of generality, that the rows of the grid are sorted so that: 
zf’ > z'f > ... > Further, suppose there are p z'^-values equal to 1, i.e., 
= -z'^ = 1, 

1 > 4+1 ^ 4+2 ^ > 0 - 

Let {y\z^) be candidate solution number j constructed by STAB for X, Vj € 
[0 : m]. From the design of STAB we know that: 



value{y^ , z^) = min value{y\z^) < min value{y\z^). (13) 

jG[0:m] iG[p:m] 



Claim 1. value{y^ , z^) = wy^ + vz^ < Vr + 



wy^ 



r—l 



1 — z'^ 

^ ^j + 1 



, for any j € [p : m]. 



Let us prove it. Consider {y\z^) for some j € [p : m\. By construction: 

- 2 ^ = 1, Vr < j, 

- 2 ^ = 0, Vr > j -I- 1, 

- y^ is an optimal solution to IP^(1, 6) with = 1 — 2 p^, Vf G [1 : n]. 

Clearly: vP = = J2i=i '^r- Let us show that 



wy^ < 



wy^ 



1 — 2 ''’ ' 



(14) 
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To prove this, we establish that the fractional solution 



1 — z'” 



(15) 



where we set = 0, is feasible to the LP relaxation of IP*' (1, 6). Since is 
optimal to IP*'(I, 6), Corollary 2 implies (14). So, let us prove: 

Claim 1.1. Solution (15) is feasible to IP*' (I, 6) with 6^ = 1 — Zp., Vi € [1 : n]. 
We show that constraint (4) is satisfied: 



. _ Ip X! ^ for any i G [1 : n]. (16) 



Indeed, in case z^. = 1, the inequality trivially holds. Otherwise, if z^. = 0, 
it follows from the construction of z-* that pi > j + 1. The ordering of the 
z''’-values implies that < zf^^. Then, using this and the fact that solution 
(yip^z'p) satisfies constraint (9), we have: 



1 — zl'’ ^ 



yf> 



1 — z'** 

-L O' + l 



(1 - > 



1 — z'** 

^ 0+1 



(1 - z'-li) > 1. 



This proves (16), and subsequently Claim 1.1 and Claim 1. 



From (13) and Claim 1: 

value{y^ , z^) < min 




wy'^ ^ 



= > Vr + min > 



wy^ 



Lemma 5 now implies that the last expression can be upper bounded by 



r—l 



e — 1 






y] Vrz'^ + wy'^ j < ( E + E 



Ip j_ nnni^P 



\r—l r—p-\-l 



Since zf = ... = Zp** = 1, the last expression can be rewritten as: 



e — 1 



Vrz'^ + wy^^ j = — ^(r;z'*’ + icy'”) 



which establishes inequality (12) and proves the theorem. 



□ 



Remark 2. There exist instances of WIS for which the ratio between the opti- 
mum values of (8)-(10) and its LP relaxation is arbitrarily close to e/(e— 1) (see 
Kovaleva [4-])- Thus, it is unlikely to improve the approximation ratio of STAB 
by another LP-rounding algorithm using the same formulation of WIS. 
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Now we give a proof of Lemma 5. This version of the proof is due to Jiff Sgall 
(see acknowledgements). 



Lemma 5. Suppose we are given numbers 1 > Ai > A 2 > ■■■> A^ > 0 and 
Am+i = 0. Further, given are positive numbers ai, 02 , ■■■, Um and V. Then we 
have: 



3 

min ( > Or + 

r=l 



Y 

1 - Z\j+i 



)< 



OrAr + Y). 

e — 1 ^ ' 

r—1 



(17) 



Proof. We use mathematical induction on the size of inequality m. For to = 0, 
the statement trivially holds. Suppose, the lemma was proven for any inequality 
of size smaller then to. First, consider the case A\ > We can write: 



3 

min ( > 

r=l 



Y 

I- A 



y+i 



3 

- ) < min Qr 






r—1 



Y 

1 - Aj+i 



) = 



= Oi + min 



r=2 



Y 

1-A 



'j+i 



The last minimum is the LHS of (17) for a smaller sequence: A 2 , Am, and 
02 , ■■■,am- Applying the induction hypothesis, we can bound the last expression 
from above as follows: (we also use our bound on Ai) 



m 

dl H ^ dr^r 

e — 1 

r=2 



< ai * Ai 

e — 1 



^rAr + Y) — 

e — 1 

r=2 



— :rCy^drAr -\-Y), 

e — 1 

r—1 

So, we have shown an induction step for the case Z\i > For the remaining 
case, A I < we give a direct proof below. 

Suppose Z\i < 2 ^. Denote the LHS of (17) by X, and notice that: 



r—1 



‘j + 1 



-Y, for 0 < j < m. 



(18) 



The following steps are justified below: 



m 

^ ^ dj-Aj- “h Y 
r—1 



rn / j \ 

I (A “ A+i) X! “’■ ) ^ 

j = l V r=l ) 



- “ A'+i)A - 

1=1 




^1 ^1+1 \ 
1 - Ajj^i j 



y + y = 
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= Z\iX 









1 - Z\ 



‘j+i 



Y 



> z\iX + — TO + m(i — z\i) ™ ^ y 

> Z\iAi + — TO + to( 1 — Z\i) (1 — Z\i)X = 

= (l + to(- 1 + (1 - Z\i)-)(1 - zii)) X >W (^1 - X. 



Here we use the ordering of delta’s and inequality (18). 

This inequality follows from the arithmetic-geometric mean inequality 
SjLi used for positive numbers Xj = 

Here we use inequality T > (1 — Ai)X, which is implied by (18) for 
j = 0, and the fact that the coefficient of Y is non-negative, which follows 
from 1 - Z\i > i > (1 - i)™. 

This inequality is elementary calculus: the minimum of the LHS over all Zli 
is achieved for 1 — Z\i = and, after substituting this value, it reduces to 

1 - (^)^+i >1-1. 

^m+1^ — e 

□ 
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Abstract. We show the first o(n^) algorithm for coloring vertices of triangle- 
free planar graphs using three colors. The time complexity of the algorithm is 
0(n log n). Our approach can be also used to design 0(n polylog n)-time algo- 
rithms for two other similar coloring problems. 

A remarkable ingredient of our algorithm is the data structure processing short 
path queries introduced recently in [9]. In this paper we show how to adapt it to 
the fully dynamic environment where edge insertions and deletions are allowed. 



1 Introduction 

The famous Four-Color Theorem says that every planar graph is vertex 4-colorable. The 
paper of Robertson et al. [10] describes an O(n^) 4-coloring algorithm. This seems to 
be very hard to improve, since it would probably require a new proof of the 4-Color 
Theorem. On the other hand there are several linear 5-coloring algorithms (see e.g. [4]). 
Thus efficient coloring planar graphs using only three colors (if possible) seems to be 
the most interesting area still open for research. Although the general decision problem 
is NP-hard [6] the renowned Grotzsch’s Theorem [8] guarantees that every triangle-free 
planar graph is 3-colorable. It seems to be widely known that the simplest known proofs 
by Carsten Thomassen (see [11,12]) can be easily transformed into 0{n?) algorithms. 
In this paper we improve this bound to 0{n log n). 

In § 2 we present a new proof of Grbtzsch’s Theorem, based on a paper of Tho- 
massen [11]. The proof is inductive and it corresponds to a recursive algorithm. The 
proof is written in a very special way thus it can be immediately transformed into the 
algorithm. In fact the proof can be treated as a description of the algorithm mixed with 
a proof of its correctness. In § 3 we discuss how to implement the algorithm efficiently. 
We describe details of non-trivial operations as well as data structures needed to perform 
these operations fast. The description and analysis of the most involved one. Short Path 
Data Structure (SPDS), is contained in § 4. This data structure deserves a separate 
interest. In the paper [9] we presented a data structure built in 0{n) time and enabling 
finding shortest paths between given pairs of vertices in constant time, provided that 
the distance between the vertices is bounded. In our coloring algorithm we need to find 
paths only of length 1 and 2. Hence here we show a simplified version of the previous, 
general structure. The SPDS is described from the scratch in order to make this paper 

* The research has been supported by KBN grant 4T1 1C04425. A part of the research was done 
during author’s stay at BRIGS, Aarhus University, Denmark. 
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self-contained but also because we need to introduce some modifications and extensions, 
like insert and identify operations, not described in our previous paper [9]. We claim that 
techniques from the present paper can be extended to the general case making possible 
to use the general data structure in the fully dynamic environment. The description of 
this extension is beyond the scope of this paper and is intended to appear in the journal 
version of the paper [9]. 

Very recently, two new Grdtzsch-like theorems appeared. The first one says that any 
planar graph without triangles at distance less than 4 and without 5-cycles is 3-colorable 
(see [2]). The other theorem states that planar graphs without cycles of length from 4 
to 7 are 3-colorable (see [1]). We claim that our techniques can be adapted to transform 
these proofs to 0{n log® n) and 0{n log^ n) algorithms respectively (a careful analysis 
would probably help to lower these bounds). However, we do not show it in the present 
paper, since it involves the general short path structure. 

Terminology. We assume the reader is familiar with standard terminology and notation 
concerning graph theory and planar graphs in particular (see e.g. [13]). Let us recall here 
some notions that are not so widely used. Let / be a face of a connected plane graph. 
A facial walk w corresponding to / is the shortest closed walk induced by all edges 
incident with /. If the boundary of / is a cycle the walk is called & facial cycle. The 
length of walk w is denoted by |w|. The length of face / is denoted by |/| and equal 
|?u|. Let C be a simple cycle in a plane graph G. The length of C will be denoted by 
\C\. The cycle C divides the plane into two disjoint open domains, D and E, such that 
D is homeomorphic to an open disc. The set consisting of all vertices of G belonging 
to D and of all edges crossing this domain is denoted by int G. Observe that int G is 
not necessarily a graph, while C U int C is a subgraph of G. A fc-path (/c-cycle, fc-face) 
refers to a path (cycle, face) of length k. 

2 A Proof of Grotzsch’s Theorem 

In this section we give a new proof of Grdtzsch’s Theorem. The proof is based on ideas 
of C. Thomassen [11]. The reason for writing the new proof is that the original one 
corresponds to an 0(nlog® n) algorithm when we employ our algorithmic techniques 
presented in the following sections. In the algorithm corresponding to the proof presented 
below we don’t need to search for paths of length 3 and 4, which reduces the time 
complexity to 0(nlog n). We could also use the recent proof of Thomassen [12] but we 
suspect that the resulting algorithm would be more complicated and harder to describe. 
We will need a following lemma that can be easily proved using discharging technique. 
Due to space limitations we omit the proof. 

Lemma 1. Let G be a biconnected plane graph with every inner face of length at least 
5, and the outer face G of length 4 < \G\ <6. Furthermore assume that every vertex 
not in V (C) has degree at least 3 and that there is no pair of adjacent vertices of degree 
2. Then G has a facial cycle G' such that V{C') nV{G) = 0 and all the vertices of C 
are of degree 3 except, possibly one of degree at most 5. 

Instead of proving Grdtzsch’s Theorem it will be easier for us to show the following 
more general result. (Let us note that it follows also from Theorem 5.3 in [7]). A 3-co- 
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loring of a cycle C is called safe if |C| < 6 or the sequence of successive colors on the 
cycle is neither (1, 2, 3, 1, 2, 3) nor (3, 2, 1, 3, 2, 1). 

Theorem 1. Any connected triangle-free plane graph G is 3-colorable. Moreover, if the 
boundary of the outer face ofG is a cycle G of length at most 6 then any safe 3-coloring 
of G\y (C)] can be extended to a 3-coloring ofG. 

Proof. The proof is by the induction on |1^(G)|. We assume that G has at least one 
uncolored vertex, for otherwise there is nothing left to do. We are going to consider 
several cases. After each case we assume that none of the previously considered cases 
applies to G. 

Case 1. G has an uncolored vertex x of degree at most 2. Then we can remove x 
and easily complete the proof by induction. If a; is a cutvertex the induction is applied 
to each of the connected components of the resulting graph. 

Case 2. G has a vertex x joined to two or three colored vertices. Note that if x has 
3 colored neighbors then |G| = 6 and the neighbors cannot have 3 different colors, 
as the coloring of C is safe. Thus we extend the 3-coloring of G to a 3-coloring of 
G\y{G) U {a;}]. Note that for every facial cycle of G\V{G) U {a;}] the resulting 3- 
coloring is safe. Then we can apply induction to each face of G\V{G) U {a;}]. 

Case 3. G is colored and has a chord. We proceed similarly as in Case 2. 

Case 4. G has a facial walk G' = a;ia:2 • • • a;fca;i such that k > 6 and at least 1 vertex 
of G' is uncolored. As Case 2 is excluded we can assume that a;i, a;2 are uncolored. 

Case 4a. Assume that x^ is colored and xi has a colored neighbor z. As Case 2 is 
excluded, X 2 and x\ have no colored neighbors, except for X3 and z respectively. Then 
G[V (G) U {z, X2}] has precisely two inner faces, each of length at least 4. Let Gi 

and G2 denote the facial cycles corresponding to these faces and let |Gi| < IG2I. We 
see that |Gi| < 6, because otherwise |G| > 7 and G is not colored. We can assume that 
Gi is separating for otherwise Gi = G', |Gi| = 6, IG2I = 6 and G2 is separating. Let 
G' = G — int(Gi). Observe that if cycle G\ is of length 6 we can add an edge to G' 
joining x\ and the vertex w at distance 3 from x\ in Gi without creating a triangle. Then 
we can apply the induction hypothesis to G'. Note that the resulting 3-coloring of Gi is 
safe because when |Gi| = 6 vertices xi and w are adjacent in G'. Moreover, as Gi is 
chordless in G, it is also a 3-coloring of G[17(Gi)] so we can use induction and extend 
the coloring to Gi U int(Gi). 

Case 4b. Thereis apathxiyi2/2a^3 orxiyixs distinct from xiX2a;3. Since G does not 
contain a triangle, X 2 ^ {j/1,2/2}- Let G" be the cycle x^X 2 X\yiy 2 X 3 or X 3 X 2 X\yiX 3 
respectively. Let Gi = G — int(G"). Then |y(Gi)| < |f^(G)| because deg(3(a;2) > 3. 
By the induction hypothesis, the 3-coloring of G can be extended to a 3-coloring of G\. 
The resulting 3-coloring of G" is also a safe 3-coloring of G\V{G”)], since |G"| < 5 
and G" is chordless. Thus we can use induction to find a 3-coloring of G" U int(G"). 

Case 4c. Since Cases 4a and 4b are excluded, we can identify x\ and without 
creating a chord in G or a triangle in the resulting graph G'. Hence we can apply the 
induction hypothesis to G'. 

Case 5. G has a facial cycle G' of length 4. Furthermore if G is colored assume that 
G' f- C. Observe that G' has two opposite vertices u, v such that one of them is not 
colored and identifying u with v does dot create an edge joining two colored vertices. 
For otherwise, there is a triangle in G or either of cases 2, 3 occurs. 
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Case 5 a. There is a path uyiv or uyiy2V, yi ^ V{C). Then t/2 ^ V{C), for 
otherwise there is a triangle in G. Then the path together with one of the two u, t;-paths 
contained in C' creates a separating cycle C" of length 4 or 5 respectively. Then we can 
apply induction as in Case 4 b. 

Case 5 b. Since case 5 a is excluded, we can identify u and v without creating a triangle 
or a multiple edge. Thus it suffices to apply induction to the resulting graph. Observe 
that when C was colored and |C| = 6 then the coloring of the outer cycle remains safe. 

Case 6 (main reduction). Observe that since inner faces of G have length 5 and G is 
triangle-free, G is biconnected. Moreover there is no pair of adjacent vertices of degree 2 , 
for otherwise the 5 -face containing this pair contains either a vertex joined to two colored 
vertices or a chord of G (Case 2 or 3 respectively). Hence by Lemma 1 there is a face 
G' = XlX2X'iX^^x^Xl in G such that deg(a;i) = deg(a;2) = deg(x3) = deg(a;4) = 3 , 
deg(a;5) < 5 and V{G) fl V{G') = 0 . Then vertices Xi are uncolored. Let yi be the 
neighbor of in G — G', for z = 1 , 2 , 3 , 4 . Moreover, let z/5, . . . ym be the neighbors of 
a)5 in G - G'. 

Case 6a. yi = j/j for i ^ j. Since G is triangle-free Xi and Xj are at distance 2 in G'. 
Then there is a 4 -cycle XiyiXjZXi in G. As Case 5 is excluded the cycle is separating 
and we proceed as in Case 4 a. 

Case 6b. yiyj € E{G) for i ^ j. Then there is a separating cycle of length 4 or 5 in 
G and we proceed as in Case 4 a again. 

Case 6c. There are three distinct vertices Xi, Xj,Xk C {a;i, . . . X5} such that each 
has a colored neighbor. Then at least one of cycles in G\V{G) U {xi,Xj,Xk}] is a 
separating cycle in G of length from 4 to 6. Then we can proceed exactly as in Case 4 a. 
By symmetry we can assume that yi or ?/2 is not colored (i.e. we change denotations of 
vertices xi, . . . 14 and j/i, . . . j/4, if needed). 

Case 6d. yt, yj and z are colored and 2; is a neighbor of for distinct z, j, k. Moreover, 

yi and z have the same color and Xi is adjacent with Xk- Again one can see that at least 
one of cycles in G\V (G) U V (G') U {yk}] is a separating cycle in G of length from 4 
to 6 so we can proceed as in Case 4 a. 

Case 6e. z/2 and a neighbor of z/i have the same color (or z/i and a neighbor of z/2 
have the same color). As Case 6d is excluded z/3 and z/4 are uncolored. By symmetry we 
can assume that identifying yi and z/2 does not introduce an edge with both ends of the 
same color (i.e. we change denotations of vertices xi, . . . X4 and z/i, . . . z/4, if needed). 

Case 6f. z/3 is colored and X5 has a colored neighbor. As Cases 6c and 6d are excluded 
z/2 is uncolored and z/4 has no neighbor with the same color as z/3. By symmetry we can 
assume that identifying z/i with z/2 and z/3 with X5 does not introduce an edge with both 
ends of the same color. 

Case 6g. Thereis apathz/izz;izz;2Z/2 distinct from z/ix 1X2 z/2 - Then we consider a cycle 
G' = yiWiW2y2X2Xiyi. As G is triangle-free, degxi = degX2 = 3 and z/iz/2 ^ E{G) 
cycle G' has no chord. Case 4 is excluded so G' is a separating cycle and we can proceed 
similarly as in Case 4 a. The only difference is that this time if |G'| = 6 we try to join 
X2 and wi to assure that a safe coloring of G' is obtained. We cannot do it when there 
is a 2 -path in G — int(G') between the vertices we want to join because it would create 
a triangle. Since degX2 = 3 this 2 -path is X2X3ZU1. But then 5 -cycle X2XzWiyiX\X2 is 
separating and we proceed as in Case 4 a. 
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Case 6h. There is a path x^ykwys for some k € { 5 , . . . m}. Then we consider a 
cycle C = ykX^XiX^y^wyk and proceed similarly as in Case 6g. 

Case 6i. Let G' be the graph obtained from G by deleting si, X2, X3, X4 and iden- 
tifying S5 with 2/3 and j/i with y2- Since we excluded cases 6c-6h, G' is triangle-free 
and the graph induced by colored vertices of G' is properly 3 -colored. Thus we can use 
induction to get a 3 -coloring c of G'. We then extend c to a 3 -coloring of G. As Case 
6b is excluded, the (partial) coloring inherited from G' is proper. If c{yi) = c(a;5) then 
we color x^,X3,X2,xi, in that order, always using a free color. If c{yi) ^ 0(0:5) we put 
c{x2) = 0(0:5) and color 0:4, 0:3, 0:4 in that order. This completes the proof. □ 

3 An Algorithm 

The proof of Grotzsch’s theorem presented in § 2 can be treated as a scheme of an 
algorithm. As the proof is inductive, the most natural approach suggests that the al- 
gorithm should be recursive. In this section we describe how to implement efficiently 
the recursive algorithm arising from the proof. In particular we need to explain how 
to recognize successive cases and how to perform relevant reductions efficiently. We 
start from describing data structures used by our algorithm. Then we discuss how to use 
recursion efficiently and how to implement some non-trivial operations of the algorithm. 
Throughout the paper G refers to the graph given in the input of our recursive algorithm 
and n denotes the number of its vertices. 

3.1 Data Structures 

Input Graph, Adjacency Lists. W. 1 . o. g. we can assume that the input graph is con- 
nected, for otherwise the algorithm is executed separately in each connected component. 
Moreover, the input graph is given in the form of adjacency lists. We also assume that 
there is given a planar embedding of the graph, i. e. neighbors of each vertex appear in 
the relevant adjacency list in the clockwise order given by the embedding. 

Faces and Face Queues. Observe that using a planar embedding stored in adjacency 
lists we can easily compute the faces of the input graph. As the graph is connected each 
face corresponds to a certain facial walk. For every edge uv there are at most two faces 
adjacent to uv. A face is called right face adjacent to (u, v) when v succeeds u in the 
sequence of successive vertices of the facial walk corresponding to the face given in the 
clockwise order. Otherwise the face is called left face adjacent to {u, v). Each face / is 
stored as the corresponding facial walk, i. e. a list of pointers to successive adjacency lists 
elements corresponding to the edges of the walk. For each such element e corresponding 
to neighbor v of vertex u, face / is the right face adjacent to (u,v). Additionally, e stores 
a pointer to /. Each face stores also its length, i.e. the length of the corresponding facial 
walk. 

We will also use three queues Q4, Q5, Q>q storing faces of length 4 , 5 , and > 6 
respectively, satisfying conditions described in cases 5 , 6, 4 of the proof of Theorem 1 , 
respectively. 

Low Degree Vertices Queue. In order to recognize Case 1 fast we maintain a queue 
storing the vertices of degree at most 2. 
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Short Path Data Structure (SPDS). In order to search efficiently for 2-paths joining 
a given pair of vertices we maintain the Short Path Data Structnre described in § 4. 



3.2 Recursion 

Note that in the recursive algorithm induced by the proof given is § 2 we need to split 
G into two parts. Then each of the parts is processed separately by successive recursive 
calls. By splitting the graph we mean splitting the adjacency lists and all the other data 
structures described in the previous section. As the worst-case depth of the recursion is 
0{n) the naive approach would involve 0{v?) total time spent on splitting the graph. 
Instead, before splitting the information on G our algorithm hnds the smaller of the two 
parts. It can be easily done using two DFS calls run in parallel in each of the parts, 
i.e. each time we hnd a new vertex in one part, we suspend the search in this part and 
continue searching in the another. Such an approach hnds the smaller part A in linear 
time with respect to the size of A. The other part will be denoted by B. Then we can 
easily split adjacency lists, face queues and low degree vertices query. The vertices of 
the separating cycle (or path) are copied and the copies are added to A. Note that there 
are at most 6 such vertices. Next, we delete all the vertices of V {A) from the SPDS. We 
will refer to this operation as Cut. As a result of Cut we obtain an SPDS for B. A Short 
Path Data Structure for A is computed from the scratch. In § 4 we show that deletion of 
an edge from the SPDS takes 0(1) time and the new SPDS can be built in 0(|A|) time. 
Thus the splitting is performed in 0{\V{A)\) worst-case time. 

Proposition 1. The total time spent by the algorithm on splitting data structures before 
recursive calls is 0{n log n) 

Proof. We can assume that each time we split the graph into two parts - the possible 
split into three ones described in Case 2 is treated as two successive splits. Let us call 
the vertices of the separating cycle (or path) as outer vertices and the remaining ones 
from the smaller part A as inner vertices. The total time spent on splitting data structures 
is linear with the total number of inner and outer vertices. As there are 0{n) splits, 
and each split involves at most 6 outer vertices the total number of outer vertices to be 
considered is 0{n). Moreover, as during each split of a A:-vertex graph there are at most 
[|J inner vertices each vertex of the input graph becomes an inner vertex at most log n 
times. Hence the total number of inner vertices is O(nlogn). □ 

3.3 Non-trivial Operations 

Identifying Vertices. In this section we describe how our algorithm updates the data 
structures described in § 3.1 during the operation of identifying a pair of vertices u, v. 
Identifying two vertices can be performed using deletions and insertions. More precisely, 
the operation ldentify(u,u) is executed using the following algorithm. First it compares 
degrees of u and v in graph G. Assume that deg^r (u) < degg (z;) . Then for each neighbor 
a; of M we delete edge ux from G and add a new edge vx, unless it is already present in 
the graph. 
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Lemma 2. The total number of pairs of delete/insert operations performed by Identify 
algorithm is bounded by O(nlogn). 

Proof. For now, assume that there are no other edge deletions performed by our algo- 
rithm, except for those involved with identifying vertices. The operation of deleting edge 
ux and adding vx during ldentify(M,i;) will be called moving edge ux. We see that each 
edge of the input graph can be moved at most [log n] times, for otherwise there would 
appear a vertex of degree > n. Subsequently, there are 0{n log n) pairs of delete/insert 
operations performed during Identify operation. It is clear that this number does not 
increase when we consider additional deletions. □ 

As we always identify a pair of vertices in the same facial walk it is straightforward 
to update adjacency lists. Lemma 2 shows that we need 0{n log n) time in total for these 
updates including updating the information about the faces incident to x and y. Each of 
affected faces is then placed in appropriate face queue (if one has to be changed). In § 4 
we show how to update the Short Path Data Structure efficiently after Identify. To sum 
up, identifying vertices takes O(nlogn) time including updating data structures. 

Finding Short Paths. In our algorithm we need to find paths of length 1 , 2 or 3 between 
given pairs of vertices. Observe that there are 0{n) such queries during an execution of 
the whole algorithm. As we show in § 4 paths of length 1 or 2 can be found in 0(1) time 
using the Short Path Data Structure. It remains to focus on paths of length 3. 

Let us describe an algorithm PathS that will be used to find a 3-path between a pair 
of vertices u, v, if there is any. We can assume that we are given a path p of length 
2 (cases 4b and 5a) or of length 3 (Case 6 g) joining u and v. W.l.o.g. we assume that 
deg(u) < deg(t!). Let A{u), A(r;) denote the adjacency lists of u and v, respectively. 
We start from assigning variables x± and X 2 to the element of A{u) corresponding to 
the edge of p incident with u. Similarly, we assign x^ and X 4 to the element of A{v) 
corresponding to the edge of p incident with v. Then we start a loop. We assign xi to 
the succeeding element and X 2 to the preceding element in A{u). Similarly, we assign 
X 3 to the succeeding element and X 4 to the preceding element in A{y). Then we use the 
Short Path Data Structure to search for paths of length 2 between xi and v, X2 and v, X3 
and It, X 4 and u. If a path is found, the algorithm stops, otherwise we repeat the loop. If 
no 3-path exists at all, the loop stops when all the neighbors of u are checked. 

Lemma 3. The total time spent on performing PathS algorithm is 0{n log n). 

Proof. We can divide these operations into two groups. The operation is called successful 
if there exists a 3-path joining u and v, distinct from p when \p\=3, and failed in the 
other case. Recall that when there is no such 3-path, vertices u and v are identified. As 
the time complexity of a single execution of PathS algorithm is 0{degu) and all the 
edges incident with u are deleted during ldentify(u,i;) operation, the total time spent on 
performing failed PathS operations is linear in the number of edge deletions caused by 
identifying vertices. By Lemma 2 there are 0{n log n) such edge deletions. 

Now it remains to estimate the time used by successful operations. Let us consider 
one of them. Let C be the separating cycle compound of the path p and the 3-path that 
was found. Let H be the graph C U int(C"). Recall that one of vertices u, v is not 
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colored, say v. Then the number of queries sent to the SPDS is at most 4 • degjj(u). 
Note that since v will be colored in graph H the total number of queries asked during 
executions of successful PathS operations is at most 8 times larger than the total number 
of edges appearing in G (an edge xv added after deleting xu during ldentify(u,u) is not 
counted as a new one here). Thus the time used by successful operations is 0{n). □ 

Removing a Vertex of Degree at Most 3. In cases 1 and 6 i we remove vertices of 
degrees 1, 2 or 3. There are at most 0{n) such operations in total. As the degrees are 
bounded the total time spent on updating the Short Path Data Structure and adjacency 
lists is 0{n). We also need to update information about incident faces. We may need to 
join two or three of them. It is easy to update the facial walk of the resulting face in 0(1) 
time by joining the walks of the faces incident to the deleted vertex. The problem is that 
edges contained in the walk corresponding to the new face store pointers to two different 
faces. Thus we use the well-known Find and Union algorithm (see e.g. [5]) for finding 
a right face adjacent to a given edge. The amortized time of the search is 0(log* n). 
As the total number of all these searches is 0{n) it does not increase the overall time 
complexity of our coloring algorithm. 

Before deleting an edge we additionally need to check whether it is a bridge. The 
check can be easily done by verifying whether the edge has the right face equal to its 
left face. If so, after deleting the edge the face is split into two faces in 0(1) time and 
we process each of the connected components recursively. 

Searching for Faces. To search for the faces described in the cases 5, 6 , 4 of the proof 
from § 2 we use queues Q 4 , Q 5 , Q>e, respectively. The queues are initialized in 0(n) 
time and the searches are performed in 0 ( 1 ) time. 

Additional Remarks. To recognize cases 2, 3, 4a, 6 c- 6 f efficiently it suffices to pass 
down the outer cycle in the recursion, when the cycle is colored. Then it takes only 0(1) 
time to recognize each case (in some cases we use Short Path Data Structure) since there 
are at most 6 colored vertices. 



4 Short Path Data Structure 

In this section we describe the Short Path Data Structure (SPDS) which can be built 
in linear time and enables finding shortest paths of length at most 2 in planar graphs in 
0(1) time. Moreover, we show here how to update the structure after deleting an edge, 
adding an edge and after identifying a pair of vertices. Then we analyze the total time 
needed for updates of the SPDS during the particular sequence of operations appearing 
in our 3-coloring algorithm. The time turns out to be bounded by 0(n log n) . 

4.1 The Structure and Processing the Queries 

The Short Path Data Structure consists of two elements, denoted as Gi and We will 
describe them after introducing some basic notions. 

A directed graph is said to be k-oriented if its every vertex has the out-degree at 
most k. If one can orient edges of an undirected graph H obtaining fc-oriented graph 
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H' we say that H can be k-oriented. In particular, when k = 0(1) we will say that H' 
is 0(l)-oriented and H can be 0(l)-oriented. will denote certain orientation of a 
graph H. The arboricity of a graph H is the minimal number of forests needed to cover 
all the edges of H. Observe that a graph with arboricity a can be a-oriented. 

Graph and Adjacency Queries. In this section Gi denotes a planar graph for 
which we build a SPDS (recall from § 3.2 that it is not only the input graph). It is widely 
known that planar graphs have arboricity at most 3. Thus G\ can be 0(l)-oriented. 
Let Gi denote such an orientation of Gi. Then xy £ E{G\) iff {x,y) £ E{G\) or 
(j/,a;) £ E{Gi). Thus, providing that we can maintain bounded out-degrees in G\ 
during our coloring algorithm, we can process in 0(1) time the queries of the form: 
“Are vertices x and y adjacent?”. 

Graph G^- Let G2 be a graph with the same vertex set as G\ . Moreover, edge vw is in 
G2 iff there exists vertex x £ V{G\) such that {x,v) £ E{G\) and (x,w) £ E{G\). 
Vertex x is said to support edge vw. Since G\ has bounded out-degree every vertex 
supports 0(1) edges in G2. Hence G2 is of linear size. The following lemma states even 
more (proof is ommited due to space limitations): 

Lemma 4. Let Gi be a directed planar graph with out-degree bounded by d. Let G2 be 
an undirecte d ^ raph with V{G2) = V{Gi) and E{G2) = {vw : (x,v) £ E(Gi) and 
(x, w) £ E{Gi)}. Then G2 is a union of at most 4 • (2) planar graphs. 



Corollary 1. If the out-degree in graph G\ is bounded by d then graph G2 has arboricity 
bounded by 12 ■ 



Corollary 2. Graph G2 can be 0 {l)-oriented. 

Corollary 1 follows immediately since the arboricity of a planar graph is at most 
3. By G2 we will denote an 0(l)-orientation of G2. Let e be an edge in G2 equal 
{v, w) or {w, v). Let a; be a vertex that supports e and let ei = (x, v) and 62 = (x, w). 
Cl, 62 £ E{Gi). We say that edges e\ and 62 are parents of e and e is a child of ei and 
62. We say that a pair {ei, 62} is a. couple of parents of e. Notice that each edge can have 
more than one couple of parents. We additionally store the following information: 

- for each e £ E{G2) a list P(e) of all pairs {ei, 62} such that e is a common child 
of 6i and 62, 

- for each e £ E{Gi) a list G(e) of pairs (c,p) where c £ E{G2) is a common child 
of e and certain edge / and p is a pointer to {e, /} in the list P(c). 

Queries About 2-Paths. It is easy to see that when g\^ and ^ are G(l)-oriented we 
can find a path uxv of length 2 joining a pair of given distinct vertices u, x in 0(1) time 
as follows: 

(i) check whether there is an oriented path uxv or vxu in Gi, 

(ii) check whether there is a vertex x such that (u, x), {v, x) £ E{Gi), 
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(iii) check whether there is an edge e = {u, v) or e = {v, u) in G 2 - If so, pick any of its 
couples of parents {(x, u), (x, x)} stored in P(e). 

To huild the shortcut graph data structure for a given graph Gi, namely graphs Gi 
and G 2 , we start from creating two graphs containing the same vertices as Gi but no 
edges. Then for every edge uv from Gi we add uv to SPDS using insertion algorithm 
described in the following section. In § 4.3 we show that it takes only 0{\V {Gi)\) time. 



4.2 Inserting and Deleting Edges 

Maintaining Bounded Out-degrees in Gi and G 2 . As graph Gi is dynamically 
changing we will need to add and remove edges from Gi and G 2 . While removing is 
easy, after adding an edge we may need to reorient some edges to leave the graph 0(1)- 
oriented. We use the approach of G. Brodal and R. Fagerberg [3]. Assume that H is an 
arbitrary graph of arboricity a. Let Z\ = 6a — 1 and assume that we want to maintain 
a zl-orientation of G, denoted by . They consider the following routine for inserting 
an edge uv. First add (u,v) to it. If outdegrt = A + 1, repeatedly a node w with 
out-degree larger than A is picked, and the orientation of all the edges outgoing from 
w is changed. Deleting an edge does not need any reorientations. We will refer to these 
routines as Insert (u, v) and Delete (u, v) respectively and we will use them for inserting 
and deleting edges in Gi and G 2 . Observe that the out-degrees in Gi will be bounded 
by 17 since it has arboricity 3 as a planar graph. Subsequently Corollary 1 guarantees 
that G 2 has bounded arboricity and hence it will be also 0(l)-oriented. 

Updating the SPDS After a Deletion. Note that after deleting an edge from Gi we 
need to find out which edges in G 2 should be deleted, if any. Assume that e is an edge 
of Gi and it is going to be deleted. For each pair {c,p) € G(e) we have to perform 
the following operations: remove the pair {e, /} referenced by pointer p from list P(c), 
remove the pair (c,p) from the list C{f). If list P(c) becomes empty we delete edge c 
from G 2 . Since e has at most d = 0(1) children, the following proposition holds: 

Proposition 2. After deletion of an edge the SPDS can be updated in 0(1) time. 



Updating the SPDS After an In ser tion. To perform insertion we first call Insert in 
graph Gi. Whenever any edge in Gi changes its orientation we act as if it was deleted 
and update G 2 as described above. Moreover, when any edge (u, v) appears in Gi, both 
after lnsert(u, v) and after reorienting (u, u), we add an edge vw to G 2 for each edge 
{u, w) present in Gi. 



4.3 Time Complexity of Updating the SPDS 

There are four operations performed by our coloring algorithm on an input graph. First 
two are insertions and Identify operations. The third one is deleting a vertex of degree 
at most 3 and will be denoted as DeleteVertex. The last operation is Cut described 
in § 3.2. These four operations will be called meta-operations. Each of them will be 
performed using a sequence of deletions and insertions. In this section we show that 
for the particular sequence of meta-operations appearing in our coloring algorithm the 
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total time needed for updating the SPDS is 0{n log n) . The following lemma is a slight 
generalization of Lemma 1 from the paper [3]. Their proof remains valid even for the 
modified formulation presented below. 

Lemma 5. Given an arboricity a preserving sequence a of edge insertions and deletions 
on an initially empty graph, let Hi be the graph after the i-th operation. 

Let Hi, H 2 , . . . , iT|cr| be any sequence of (2a) -orientations with at most r edge 
reorientations in total and Zet Fi , Pf, ■ • ■ > be the A- orientations ofH \ , H 2 , . . . H\^\ 

appearing when Insert and Delete algorithms are used to perform a. Let k be the number 
of edges uv such that (u, v) € i^, (v, u) G itt and the i-th operation in a is insertion 
of uv. 

Then the Insert and Delete algorithms perform at most 3(fc + r) edge reorientations 
in total on the sequence a. 



Lemma 6. The total number of reorientations in Gi used by Insert and Delete algo- 
rithms to perform a sequence ofk meta-operations leading to the empty graph is 0(k). 



Proof. Let us denote the sequence of meta-operations by cr. Let G* be the graph G\ 
after the i-th meta-operation. We will construct a sequence of 6-orientations Q = 
. , (^ . Observe that ^ has no edges thus it is 6-oriented. We will describe 
G* using G®+^. If Oi is an insertion of edge uv, G® is obtained from G*+^ by deleting 
uv. If Ui = ldentify(u,u) the edges incident with the identihed vertex x in G*+^ are 



partitioned into two groups in G* without changing their orientation. Recall that u and v 
are not adjacent in GL Thus outdeg^ u < outdeg-;t x and outdeg-;t v < outdeg-^t x. 

Ct Ct Cr 

If Oi = Delete Vertex we simply add a vertex of degree at most 3 to G* in such a way that 
that the added edges leave the new vertex. Its out-degree is 3 and the out-degrees of other 
vertices do not change. Finally we consider the case when Oi is Cut operation. Let A be 
the graph induced by the vertices removed from GL As A is planar it can be 3-oriented. 
The remaining edges, joining a vertex from A, say x, with a vertex from say y are 
oriented from x to y. Observe that a vertex in A can be adjacent with at most 3 vertices in 
G*+^, since there are no triangles. Thus is 6-oriented. Description of the sequence 
Q of orientations is finished now. Observe that there is no single reorientation in this 
sequence. 

Now let us compare the sequence Q with the sequence T of 17-orientations appearing 
when Insert and Delete algorithms are used to perform cr. Consider insertions involved 
with a single Identify operation. Then at most 6 -I- 17 = 0(1) inserted edges obtain 
different orientations in Q and T . Hence the total number of such Insertions involved 
with identifying vertices isO(fc). Operations Delete Vertex and Cut do not use insertions 
and trivially there are at most k ordinary insertions in a. Thus by Lemma 5 the Insert 
and Delete algorithms perform 0(k) reorientations in G\ to perform cr. □ 



Lemma 7 (proof ommited). Let a be a sequence ofk-\-l meta-operations performed 
on an initially empty planar graph G\ = (V, E). Moreover, let the first k operations in 
a be insertions. Then the total number of reorientations in G 2 used by Insert and Delete 
algorithms to perform a is 0(1 log | V|). 
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Corollary 3. For any planar graph G\ the Short Path Data Structure can be constructed 
in 0{\V{Gi)\) time. 

Corollary 4. Let n be the number of vertices of the graph on the input of the coloring 
algorithm. The total time needed to perform all meta-operations executed by the coloring 
algorithm, including updating the Short Path Data Structure, is 0(n log n). 

Proof. By Lemma 2 all Identify operations cause 0{nlogn) pairs of deletions/inser- 
tions. By Proposition 1 all Cut operations cause O(nlogn) deletions. There is 0{n) 
meta-operations performed. Hence, by Lemmas 6 and 7 the total number of reorientations 
needed by insert operations is 0{n log n). It ends the proof. □ 

Let us also note that using techniques from proof of Lemma 7 one can show that 
update of SPDS after an insertion in arbitrary sequence of insert/delete operations takes 
amortized 0(log^ n) time. The approach presented here can be extended to the general 
version of SPDS. It needs 0(1) time to find a shortest path between vertices at distance 
bounded by a constant k and updates after an insertion in 0(log^ n) amortized time. 
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Abstract. We study the complexity of and algorithms to construct ap- 
proximations of the union of lines, and of the Minkowski sum of two 
simple polygons. We also study thick unions of lines and Minkowski 
sums, which are inflated with a small disc. Let b = D/e he the ratio of 
the diameter of the region of interest and the distance (or error) of the 
approximation. We present upper and lower bounds on the combinato- 
rial complexity of approximate and thick unions of lines and Minkowski 
sums, with bounds expressed in b and the input size n. We also give 
efficient algorithms for the computation. 



1 Introduction 

Minkowski sums are a basic concept in motion planning and therefore, they have 
been studied extensively in computational geometry. For two sets P and Q in the 
plane, it is defined as P (B Q = {p + q\p & P, q & Q}, where p and q are points 
whose coordinates are added, see Figure 1 for an example. Minkowski sums are 
used to define the free configuration space of a translating robot amidst obsta- 
cles [13,19]. For essentially the same reason, Minkowski sums are important in 
packing problems, where given shapes have to be packed without overlap and 
without rotation inside a given region. This problem shows up in efficient use 
of fabric in cloth manufacturing [3,11,15]. Other applications are in cartogra- 
phy [21]. One of the basic questions that arises is: Given two simple polygons 
with n vertices in total, compute their Minkowski sum. All algorithms for this 
problem must take 17 (n^) time, simply because the output can have as large 
complexity. 

In practice, Minkowski sums of high complexity rarely show up, and it is 
interesting to discover under what conditions the maximum complexity of the 
Minkowski sum is considerably lower. Similarly, it is interesting to know when 
we can compute it more efficiently. Questions like these have led to research 
on realistic input models and the development of geometric algorithms for fat 
objects. One of the main results is that the union complexity of n triangles, all 
of whose angles are at least some given constant, is O(nloglogn) rather than 
0{n^) [14,18]. More recently, Erickson showed that two local polygons with n 
vertices have a Minkowski sum of complexity 0(n^ log n) [9]. Much research has 
been done on realistic input models, see [4,5,6,7,8,16,20,22] for a sample. 
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Fig. 1. Left: Minkowski sum of P and Q. Right: The Minkowski sum of two polygons 
can have complexity 0{n^) in the worst case. 



This paper also deals with Minkowski sums and ways to overcome the 
lower bound for construction. We will not make assumptions on the input poly- 
gons, but instead study approximate Minkowski sums. We allow the Minkowski 
sum that we compute to be off by a distance s, and assume that there is a region 
of interest with diameter D. We let b = D/e he the relative precision, and we 
prove complexity and running time bounds for approximate Minkowski sums. 
We assume throughout the paper that & = 17(1). If we do not make assumptions 
on both e and D, we cannot get any bound better than 0{n^). 

As a simpler first step we consider approximate unions of lines. We show 
that a union of n lines admits an approximation of complexity 0(min(6^, n^)), 
which is optimal in the worst case, and we can compute it in 0{n -k 
time assuming b = 0{n), for any 5 > 0. A special type of approximate union 
is the union of thick lines, or strips. It is the maximal subset of the plane that 
is approximate. Being off by a distance at most e implies that the strips have 
width 2e. We show that n such strips in a region of diameter D have union 
complexity I7(n -|- b^) and 0{n + b\/bn) in the worst case, and we can compute 
it in time assuming b = 0{n). 

Then we proceed with approximate and thick Minkowski sums. For thick 
Minkowski sums we prove a lower bound of I7(n^ + b^) and an upper bound 
of 0((n -k b)riy/blog n + n?logn) on the complexity. The approximation can 
be computed in 0{mm{nb^ , time and the thick Minkowski sum in 

0(n®/3+-562/3) 

Interestingly, papers on fatness also imply complexity bounds on thick ar- 
rangements and Minkowski sums [7,14,18]. The fatness parameter is related to 
b and we would get a bound of 0{nblogb) for the thick union of lines and 
0{n?b^ logb) for the thick Minkowski sum [18]. The bounds presented in this 
paper are better for most values of b: For any 7 > 0, we have better bounds 
if 6 = f2{rp) and b = (thick union of lines) or b = (thick 

Minkowski sums). 
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2 Approximate and Thick Sets 

We are interested in scenarios in which we do not need the exact shape of the 
union of lines or Minkowski sum, but instead may settle for a structure that 
is ‘close’. This freedom allows us to remove holes in the union or Minkowski 
sum that are smaller than the level of detail that we are interested in, thereby 
reducing the combinatorial complexity of these structures. The closeness can be 
expressed by an absolute precision value e > 0. We will study unions of lines 
and Minkowski sums in this approximate sense. 

Definition 1. An e- approximation of a set C C is a set A C sueh that 
for any point p & A, there is a point p' £ C at a distance at most e, and for any 
point p G C, there is a point p' G A at a distance at most e. 

We will also study a particular e-approximation. The e-thick set C is the largest- 
area e-approximation of C. Let Ag be the disc of radius e centered at the origin. 

Definition 2. The e-thick set C C R^ is the Minkowski sum C © A^. 

We are interested in e-approximations of reduced combinatorial complexity. 
A small absolute precision value e, however, does not lead to a complexity reduc- 
tion. Let / C R^ be a region of interest, and let D be its diameter. The interesting 
cases in terms of complexity reductions occur when the ratio of the diameter D 
of / and the precision value e is bounded. The ratio of b = D/e can be seen as 
a measure for (the inverse of) the relative precision of the approximation. 

3 Unions of Lines 

Let L be a set of n lines. We study the combinatorial complexity of approxi- 
mate and thick unions of the lines in L. In addition we present algorithms for 
computing the approximate and thick unions. 

3.1 Complexity 

The complexity of an arrangement of n lines is 0{n^) and this bound is tight. 
Approximations of lower complexity exist for the union of the n lines in L when 
the relative precision b = o(n). We first prove a lower bound. 

Lemma 1. Any e- approximation of the union of n lines has complexity f2{b^) 
in the worst case, assuming b = 0(n). 

Proof, (sketch) Place a set L of 0{b) horizontal and vertical lines at distance 
6e to form a regular square grid in I. This creates f2(fp) squares of 4e by 4e in 
which no point of the approximation may be. We can argue that any (polygonal) 
approximation must have a vertex close to every one of the 17(6^) squares. 

Lemma 2. Any union ofn lines has an e- approximation of size 0(min(6^, n^)). 
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Proof. As the set L of lines is its own £-appoximation, the 0{nf) bound is trivial. 
Now consider the points on a regular ex e grid inside I. Any point closer than 
£ from some point on a line in L is chosen to be in the set A. It is clear that A 
is an £-approximation, of complexity 0{b^). 

The £-thick union of lines equals the union of £-thick lines. Let = {Z © 
Ag\l G L}. All elements of are strips of width 2e. 

Lemma 3. The complexity of the e-thick union of n lines can be fl{n + 6^). 

Proof. Choose 6/3 equally-spaced horizontal strips (e.g. a distance e apart) that 
all intersect I. Choose 6/3 vertical strips in the same way. The complexity of the 
union of these strips is 17(6^). We let f2{n) strips with slightly different positive 
slopes intersect the uncovered lower right corner of /, such that each of the I7(n) 
pairs of strips with successive slopes adds a vertex on an arbitrarily small circular 
arc centered at the corner. 

A hole in the union of strips from originates from a hole with area C(£^) 
in the union of the lines from L. As a result, the number of holes in the union 
of strips that intersect I is 0{b^). We can now prove: 

Lemma 4. The complexity of the e-thick union of n lines is 0{n + b'/bn). 

Proof. A vertex v of the union is the intersection of a line [3y bounding a strip 
Sy € Le and a line /3/ bounding another strip s/ € Lg. It is an endpoint of the 
equally-long segments Py fl s'y and PyDSy, whose interiors are in the interior of 
the union. We separate the vertices v inside / for which /3„ fl s/ intersects the 
boundary dl of I from those for which this segment lies entirely inside I. 

If Pyds'y intersects dl, we charge the vertex v to the intersection /3„ris/n9/ = 
Pv n dl. The intersection will be charged only once, giving 0{n) vertices of this 
type in total. 

We next handle the union vertices v in I for which /3„ fl s'y lies entirely inside 
I . Each union vertex v belongs to a convex hole. We define the (turning) angle 
ay to be 7T — p, where p {p < tt) is, the angle between the union edges incident 
to V (or the equivalent angle between Py fl s'y and /?(, fl s„). Let 7 = \Jbjn. We 
bound the number of vertices v with ay > ^ differently from those with Ou < 7 . 

Observe that the sum of the (turning) angles of the vertices of a single hole 
equals 27 t. Since the number of holes in / is 0(6^), the sum of the angles of all 
union vertices in / is 0(6^) as well. The number of union vertices v with ay > j 
inside I is therefore 0{b“^ /^) = O {b'/bn). 

Observe that the length of the part inside I of each bounding line of a strip 
is at most D. Since the number of strips intersecting I is 0{n), the total length 
of all their bounding lines is 0{Dn). The intersection of the strips and s/ 
causes the interior of the segment Py 0 s/ adjacent to v to be in the interior 
of the union. For a vertex v with Oi, < 7 this means that a stretch of length 
\PyC\s'y\ = e/ sino!« > e/ay > e/^ on Py adjacent to v cannot contain another 
union vertex. The number of union vertices v with ay > "f inside / is therefore 
0{nD/{e//) = 0{b/bn). 
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3.2 Algorithms 

As we observed, an approximate union of lines need only be a set of 0(6^) points 
(if b = 0{n)) that is a subset of a regular square grid with spacing e inside I. 
For every point in this grid we need to find out whether it lies inside at least one 
of the n strips with width 2e, centered at the input lines. Every grid point inside 
some strip is guaranteed to be within £ of a line, and every point on an input 
line is guaranteed to be within £/-\/2 from a covered grid point. We can get the 
covered grid points in several ways. The easiest is to iterate over the strips, and 
for each strip find the 0{b) points inside. This takes 0{nb) time in total. In case 
we want to find a superset of the union of lines which is also an approximate 
union, we can take groups of four grid points covered by the same strip and put 
the whole grid square in the approximate union. For groups of three we take the 
triangle that is half a grid square. 

A more efficient algorithm when b is sufficiently smaller than n is obtained 
by building a data structure on the strips, such that for a query point, we find 
out efficiently whether it lies inside some strip. Using partition trees, we can 
make an 0{m) size data structure in time such that queries can be 

answered in / ^/m) time, for any n < m < n^) [1]. We will query with 

the 0(6^) grid points, so we choose m to balance the preprocessing time and the 
total query time. The resulting algorithm takes 0{n + 6^ + time. 

Theorem 1. An e-approximate union of n lines inside a square of diameter D 
has complexity 0{b^) and can be computed in 0{nb) and in 0{n + 
time, for any S > 0, and where b = D /e = 0(n). 

The union of strips is one of the problems known to be 3suM-hard [10], 
implying that a subquadratic time algorithm for the construction is not likely to 
exist. But since we have £-thick strips and are interested in a region of diameter 
D only, we can obtain subquadratic construction time. 

To compute the union of n £-thick strips we will make use of multi-level 
partition trees and ray shooting [1]. All edges that appear in the thick union 
are segments on the set L of 2n lines bounding the strips, and we will shoot 
along these lines to find the exposed edges. We build two data structures: one 
for querying when we are outside the union of the strips (to find the end of an 
exposed edge), and one for querying when we are inside the union of the strips 
(to find the start of an exposed edge). The former structure is easy. We simply 
preprocess all lines L bounding strips into a ray shooting structure. 

It is more difficult to find the start of an exposed edge. When the starting 
point of the query ray is inside the union of the strips, we do not want to find 
the next intersection with some strip boundary, because it may not be part of 
the output, and we could be shooting Ofnf) rays. Instead, we build a partition 
tree T of which the main tree allows us to select all strips that contain the query 
point q (start of the ray) represented by a small number of canonical nodes. For 
every node in the main tree we build an associated structure for ray shooting in 
the represented strips, but from infinity and in the opposite direction. This way 
we find the point q' where the ray will exit all strips in which the query point q 
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lies. Either q' is a vertex of the union of strips and we start to find an exposed 
edge, or q' lies inside one or more strips, which necessarily are different from the 
ones that contained q. We use T to distinguish these cases. If q' lies inside, we 
continue to find the last exit point of the strips that contain q' , as we did before 
for q. It is clear that this approach will find the next starting point along the 
line we are processing. 

Lemma 5. Let q be any point inside the union of strips. If after at most two 
queries as specified above, the point q" lies in the union of strips, it has distance 
> e from q. 

Proof. Suppose we query with q and find that q' is in the union. Then q' must 
lie in some strip S in which q does not lie. Point q" must lie such that qq" cuts 
through S completely, which proves the lemma. 

The lemma shows that the number of ray shooting queries inside the union 
is at most 0{b) for any line, so 0{nb) for all lines of L together. We also do at 
most 0{n + b'/bn) queries outside the union due to the complexity bound. Since 
b = 0{n), we do 0{nb) ray shootings. For both structures we have pre- 
processing time and /^/m) query time by the standard theory of partition 

trees [1]. We balance the preprocessing and the total query time and obtain an 
time bound to determine all exposed edges. It is easy to assemble 
these into the union of strips without affecting the time bound asymptotically. 

Theorem 2. The union of a set of n e-thick strips inside a square of diameter 
D has complexity 0{n b'/bn) can be constructed in time, for 

any i5 > 0, and where b = D j e = 0{n). 

4 Minkowski Sums 

Let P and Q be polygons with n vertices. We study the combinatorial complex- 
ity of approximate and thick Minkowski sums P ® Q. In addition we present 
algorithms for computing approximate and thick Minkowski sums. 

4.1 Complexity 

The complexity of the Minkowski sum of two polygons with n vertices is 0{n*) 
and this bound is tight in the worst case. The following lemma says that ap- 
proximations of lower complexity exist when the relative precision b = o(n^). 
We omit the proof as it is similar to that of Lemma 2. 

Lemma 6. Any Minkowski sum of two polygons with n vertices admits an e- 
approximation of complexity 0(min(&^, n^)), and this bound is tight. 

We turn our attention to the thick Minkowski sum. 

Lemma 7. The complexity of the e-thick Minkowksi sum of two polygons with 
n vertices can be L2{n^ -\- b'^) . 
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Fig. 2. The Minkowski sum of P, Q and an e-disc can have holes. 



Proof. The bound of 17(6^) is trivial. Figure 2 shows that P © Q © can have 
f2{nf) holes. The sum of the lengths of the 0{n) spikes of P and of Q is such 
that the distance from the 0(n^) resulting spikes of P © Q to the horizontal top 
bar of P© Q is (just below) 2e. The Minkowski sum of P© Q with will have 
one tiny hole between each of the 0{v?) pairs of supporting lines of consecutive 
spikes of P © Q. 

We deduce an upper bound on the complexity of the thick Minkowski sum 
by considering the complexity of the union boundary of all Minkowski sums of 
an edge of P, a vertex of Q, and Z\e, and all Minkowski sums of an edge of Q, 
a vertex of P, and A^. The boundary of the union of these racetracks (which 
are in fact thick translated edges of P or Q) is a superset of the boundary of 

P©Q©Z\e- 

Let C be the set of 0{n^) racetracks. A racetrack is short if its diameter 
is at most 4e and long otherwise. Let Cs and Ci be the subsets of short and 
long racetracks in C respectively. We regard each racetrack as the union of two 
discs of diameter 2e and a rectangle of width 2e. Let S be the set of all discs 
resulting from racetracks in C, and let Ri be the set of rectangles — which are all 
longer than 2e — resulting from racetracks in C/. We obtain a hexagonal inner 
approximation of a racetrack if we replace both discs by enclosed squares with 
side lengths e-\/2 such that the diagonals of these squares coincide with the sides 
of the rectangle that were originally covered by the discs. Observe that each 
hexagonal racetrack is the Minkowski sum of a translated copy e of an edge of 
P or Q with a square with side length £\/2 whose diagonal is orthogonal to e. 
Let Hs be the set of all hexagonal racetracks resulting from racetracks in Cs- 

It is well known that the union complexity of a collection of unit discs is 
linear in the number of discs. The complexity of Uses hence the number of 

vertices resulting from pairwise intersections of arcs in C, is therefore 0(nf). For 
the sake of brevity, we let U A = U^ex ^ and [jX,Y = (U,,gjf a;)U(Uy6F v)- We 
derive a bound on the number of remaining vertices of the union [_}C = [JCs,Ci 
of racetracks in three steps. We will bound 

— the number of vertices resulting from pairwise intersections of straight seg- 
ments in C by considering the number of vertices in U i 

— the number of vertices resulting from intersections of a circular arc from C 
and a straight segment from Ci by considering the number of vertices in 
[jRi.,S, and 
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— the number of vertices resulting from intersections of a circular arc from C 
and a straight segment from Cs by considering the number of vertices in 



Lemma 8. The number of holes in the union IJ ii;, Hg is 0{n^ + b^). 

Proof. The arrangement of line segments defined by the 0{n^) Minkowski sums 
of an edge of P or Q and a vertex of Q or -P has 0{b^) convex holes of area at 
least and 0{n^) non-convex holes, as each of these holes must contain 

an endpoint of a segment. Replacing the line segments by the corresponding 
rectangles or hexagonal racetracks may lead to splits of holes. If one of the six 
vertices of a racetrack h £ iLg splits a hole, then that particular vertex ends up 
inside IJ i?;, iLg. If an edge of h splits a hole, then a vertex of another racetrack 
or rectangle ends up inside h and thus inside IJ i?p iLg. Similar arguments apply 
to the four vertices and edges of a rectangle r £ i?/. As a result, the number of 
newly created holes is 0{n^). 

The bound on the number of holes plays a role in the following result, whose 
proof is given in the appendix. 

Lemma 9. The complexity of the union IJi?; is 0{{n + b)n^/bTogn + n^logn). 

Addition of the short hexagonal racetracks of Hg does not increase the asymp- 
totic complexity of the union. 

Corollary 1. The complexity of the union [jRi,Hs is 0{{n + b)ny/b\og n + 
log n) . 

The following interesting theorem combines the union complexity of discs and 
rectangles. It is a variation on a result by Pach and Sharir [17]. We emphasize 
that a unit-width rectangle is a rectangle whose shortest side has unit length. 
The theorem provides the key to our bound for the complexity of IJ i?; , S. 

Theorem 3. Given a set R of m unit-width rectangles with union complexity 
0(/(to)) and a set S of n unit-diameter discs. Then the complexity of union of 
the rectangles from R and the discs from S is 0(f(m) -\- n). 

There are simple examples that show that the theorem is not true if discs 
can be arbitrarily large, or rectangles arbitrarily short. In both cases an 0(nm) 
complexity construction can be made. These examples and a proof of the theorem 
are given in the full paper. 

We now apply the result of Lemma 9 in Theorem 3 and obtain a complexity 
bound on the union of the rectangles in Ri and the discs in S, which leads to 
the following result. 

Corollary 2. The complexity of the union is 0{{n -\- b)n^/b\og n -\- 

log n) . 
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It remains to bound the complexity of |J Cg, S'. The shapes in Cg U S are all 
fat and have comparable sizes. Let r be an equilateral triangle whose sides have 
length e\/3. For each shape / S Cg U S and each point p £ df, we can place 
T fully inside / while one of its corners coincides with p. Moreover, we observe 
that the diameter of any / € Cg U S is between 2e and 4e. Efrat [5] states that 
in this case the union complexity is 0(Ag+2(|C'g U S|), where s is the maximum 
number of boundary intersections of any two shapes from Cg U S. 

Lemma 10. The complexity of the unions UCg,S*sO(A6(n2)). 

The bounds on the complexities of the unions IJ i?;, ilg, \jRi,S, and IJ Cg, S' 
imply the following bound for the e-thick Minkowski sum. 

Lemma 11. The complexity of the e-thick Minkowski sum of two polygons with 
n vertices is 0{{n -\- b)ny/blog n n'^ log n). 

4.2 Algorithms 

The approximate Minkowski sum of two simple polygons can be computed in 
the same way as for approximate union of lines. We determine for each of the 
0{b^) points of an appropriately defined grid whether it is inside or outside the 
Minkowski sum. There are several ways to do this. 

If b is small, we can test each grid point without preprocessing in 0(n) time. 
The test whether a given point — which we can assume to be the origin after 
a translation — lies in the Minkowski sum of two polygons P and Q reduces to 
the intersection test for two simple polygons P and —Q (which is Q mirrored in 
the origin). This on its turn can be solved by testing if a (combined) polygon 
has self-intersections: compute a line segment s that connects the boundaries 
of P and —Q without intersecting P and —Q elsewhere, and convert s into a 
narrow passage to make a polygon that is a combination Cg (P, Q) of P and 
—Q. The test whether polygon Pg(P, Q) is simple determines whether P and Q 
intersect, and therefore whether the origin lies in the Minkowski sum of P and 
Q. The simplicity test takes linear time as a by-product of Chazelle’s linear time 
triangulation algorithm of a simple polygon [2] . Concluding, we can compute an 
approximate Minkowski sum of two simple polygons with n vertices in total in 
0{nP) time. 

For larger values of b it becomes more efficient to see the Minkowski sum of 
P and Q as the union of O(n^) triangles. We can store them in a partition tree 
such that point containment queries can be answered efficiently [1]. Using the 
preprocessing versus total query time trade-off for range searching we obtain an 
0{n^ b"^ (rr&)^/^+^) time algorithm for any constant ^ > 0. In the same way 

as for the approximate union of lines we can make the approximate Minkowski 
sum a connected subset of the plane. 

Theorem 4. An e-approximate Minkowski sum of two simple polygons with n 
vertices in total, inside a square of diameter D has complexity 0{b^) and can 
be computed in 0{nb^) and in 0(nf -Tb"^ time, for any <5 > 0, and 

where b = D /e = O(n^). 
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To compute the thick Minkowski sum of two simple polygons P and Q we use 
the ray shooting approach of Section 3. However, we cannot do exactly the same, 
because a ray that enters the union of rectangles and discs is not guaranteed to 
stay inside the union over a length at least e. Furthermore, we have to deal with 
the circular parts of the thick Minkowski sum. 

For convenience we will compute the Minkowski sum of every vertex of P, 
edge of Q, and and every vertex of Q, edge of P, and A^. The union of these 
0{n^) racetracks is a subset of the thick Minkowski sum. It has the same asymp- 
totic complexity, and from it the true thick Minkowski sum can be constructed 
easily by filling holes and removing boundary edges of these holes. 

Let C again be the set of O(n^) racetracks. We consider each racetrack to be 
the union of two discs and a rectangle as before, giving a set S of O(n^) discs and 
a set R of 0{n?) rectangles. The union of the discs has complexity 0{in?) and 
can be constructed in O(n^log^n) time by the algorithm of Kedem et al. [12]. 
The union of the rectangles has complexity 0((n + b)ny/blogn -k logn), and 
we show next how to compute it. Recall that every rectangle has width 2e, and 
the length can vary from nearly 0 to linear in D. Let Rs be the subset of (short) 
rectangles of length at most 4e and let Ri be the rectangles of length > 4e. For 
each rectangle in i?/, we cut off two squares of size 2e by 2e, one from either 
end. Let i?[ be the set of shortened rectangles from Ri, and let R'^ be the union 
of Rg and the 2|i?;| squares that were cut off. 

We preprocess R'g as follows. Compute the union of the rectangles in R'^ 
(through their hexagonal racetrack) explicitly with the algorithm of Kedem et 
al. in 0(Ai4(n^) log^ n) time. It has complexity 0(Ai4(n^)) by Efrat’s result [5]. 
(Two hexagons can intersect 12 times). Preprocess the edges of the boundary of 
the union into a ray shooting structure. The start of the query ray can be inside 
or outside the union. 

We preprocess R[ as in Section 3: the main tree is a partition tree in which 
all rectangles that contain a query point (start of the query ray) can be selected 
in a small number of canonical subsets, represented by nodes of the tree. For 
every node of the tree we store an associated structure for ray shooting ‘from 
the other side’, so we can find the last exit of the rectangles of R[. 

A query is always performed in both structures. There are four cases for the 
start q of the query ray: 

l.li q ^{}R'g and g ^ IJ R[, then the (double) ray shooting query will give the 
next vertex along the ray, which is in the union [J i? as well. 

2. If g G U and q then either the query ray exits [J i?( first, in which 

case we find a vertex of the union, or the query ray enters IJ R[ first. In the 
latter case, the next ray shooting query is done from the exit point of IJ R'g. 

3. If <7 ^ lji?( and q G \jR'i, then either the query ray exits the subset of the 
rectangles of R[ that contain it first, or the query ray enters IJR); first. In the 
latter case, the next ray shooting query can start at the exit of the subset from 
(J i?[. In the former case, we either exit [J R[ and find a vertex of the union, or 
we stay inside [J R[ and the next query will be of type 3 again. 
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4. If g € U^s q&\jR'i, then either the query ray exits lji?s last, or the 
query ray exits the subset of the rectangles of R[ that contain it last. We will 
shoot from the furthest point hit, and the query can be of types 1, 2, 3, or 4. 

We will perform the ray shooting queries along the sides of the rectangles in i?, 
so we determine IJ R this way. We can prove that ray shooting queries find a 
vertex of the union, or the starting point of a constant number of queries later, 
the query ray advances in a way similar to Lemma 5. 

Lemma 12. If five consecutive ray shooting queries stay fully inside IJ^; then 
the distance from the start of the first and end of the fifth ray is at least 2e. 

We showed that the total number of queries done is bounded by 0{n^b) plus 
the union complexity of IJ R. Since we assume b = O(n^), we do 0{n^b + (n + 
b)ny/b logn + logn) queries, and after balancing the preprocessing time and 
total query time we spend time on ray shooting to compute |J R. 

After computing the union |J i? of rectangles and the union IJ S' of discs we 
compute the union of unions with a simple overlay, for example by a plane sweep. 
Every new intersection point appears in the thick Minkowski sum, so the overlay 
step takes 0{{n^\/b + nb\/b + Ai 4 (n^)) log^'® n + n^ log^ n) time. This is always 
dominated by the total ray shooting time. 

Theorem 5. The e-thick Minkowski sum of two simple polygons with n vertices 
in total inside a square of diameter D has complexity 0{{n + b)n^b\og n + 
n^logn) and can he constructed in time, for any J > 0, and 

where h = D /e = 0{n^). 

5 Conclusions and Open Problems 

We have studied the fundamental geometric concept of Minkowski sums in a 
context that bears resemblance to the study of realistic input models. We have 
presented new combinatorial and algorithmic results on approximate and thick 
unions of lines, and approximate and thick Minkowski sums. The results are 
related to fatness and realistic input models, but we take the fatness parame- 
ter into account explicitly. Although we studied unions of lines and Minkowski 
sums, our methods can be used for approximate and thick unions of triangles 
as well. For £-approximate unions of n triangles in a region of interest with di- 
ameter D, the complexity is 0{h^) in the worst case, and we can compute it in 
0{n time, where (5 > 0 and b = D/e = 0{n). For thick unions 

of triangles, we get an -|- n) and Ofnlogn byjhn log n) complexity bound 
and an time algorithm. 

The main open problems are closing the gaps left in the complexity bounds 
and improving the running time of the algorithms. We believe that in any case 
the ^logn factor in the upper bound for thick Minkowski sums is an artifact 
of the proof. It would also be interesting to extend the combination theorem 
for unions of unit-width rectangles and unit-width discs to the case where the 
radius of the discs may be smaller. 
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Abstract. We propose a novel randomized algorithm for computing a dominating 
set based clustering in wireless ad-hoc and sensor networks. The algorithm works 
under a model which captures the characteristics of the set-up phase of such 
multi-hop radio networks: asynchronous wake-up, the hidden terminal problem, 
and scarce knowledge about the topology of the network graph. When modelling 
the network as a unit disk graph, the algorithm computes a dominating set in 
polylogarithmic time and achieves a constant approximation ratio. 



1 Introduction 

Ad-hoc and sensor networks are formed by autonomous nodes communicating via radio, 
without any additional infrastructure. In other words, the communication “infrastruc- 
ture” is provided hy the nodes themselves. When being deployed, the nodes initially 
form an unstructured radio network, which means that no reliable and efficient com- 
munication pattern has been established yet. Before any reasonable communication can 
be carried out, the nodes must establish a media access control (MAC) scheme which 
provides reliable point-to-point connections to higher-layer protocols and applications. 
The problem of setting up an initial structure in radio networks is of great importance in 
practice. Even in a single-hop ad-hoc network such as Bluetooth and for a small number 
of devices, the initialization tends to be slow. Clearly, in a multi-hop scenario with many 
nodes, the time consumption for establishing a communication pattern increases even 
further. In this paper, we address this initialization process. 

One prominent approach to solving the problem of bringing structure into a multi-hop 
radio network is a clustering, in which each node in the network is either a cluster-head 
or has a cluster-head within its communication range (such that cluster-heads can act 
as coordination points for the MAC scheme) [1,4,6]. When we model a multi-hop radio 
network as a graph G = (V,E), this clustering can be formulated as a classic graph 
theory problem: In a graph, a dominating set is a subset S' C C of nodes such that for 
every node v, either a) u £ S or b) u' € S for a direct neighbor v' of v. As it is desirable 
to compute a dominating set with few dominators, we study the minimum dominating 
set (MDS) problem which asks for a dominating set of minimum cardinality. 

The computation of dominating sets for the purpose of structuring networks has been 
studied extensively and a variety of algorithms have been proposed, e.g. [4,7,9,10,12]. 
To the best of our knowledge, all these algorithms operate on an existing MAC layer, 
providing point-to-point connections between neighboring nodes. While this is valid in 
structured networks, it is certainly an improper assumption for the initialization phase. In 
fact, by assuming point-to-point connections, many vital problems arising in unstructured 
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networks (collision detection, asynchronous wake-up, or the hidden terminal problem) 
are simply abstracted away. Consequently, none of the existing dominating set algorithms 
helps in the initialization process of such networks. 

We are interested in a simple and practical algorithm which quickly computes a 
clustering from scratch. Based on this initial clustering, the MAC layer can subsequently 
be established. An unstructured multi-hop radio network can be modelled as follows: 

- The network is multi-hop, that is, there exist nodes that are not within their mutual 
transmission range. Being multi-hop complicates things since some of the neigh- 
bors of a sending node may receive a transmission, while others are experiencing 
interference from other senders and do not receive the transmission. 

- The nodes do not feature a reliable collision detection mechanism [2,5,8]. In many 
scenarios not assuming any collision detection mechanism is realistic. Nodes may 
be tiny sensors in a sensor network where equipment is restricted to the minimum 
due to limitations in energy consumption, weight, or cost. The absence of colli- 
sion detection includes sending nodes, i.e., the sender does not know whether its 
transmission was received successfully or whether it caused a collision. 

- Our model allows nodes to wake-up asynchronously. In a multi-hop environment, it 
is realistic to assume that some nodes wake up (e.g. become deployed, or switched 
on) later than others. Consequently, nodes do not have access to a global clock. 
Asynchronous wake-up rules out an ALOHA-like MAC schemes as this would 
result in a linear runtime in case only one single node wakes up for a long time. 

- Nodes have only limited knowledge about the total number of nodes in the network 
and no knowledge about the nodes’ distribution or wake-up pattern. 

In this paper, we present a randomized algorithm which computes an asymptotically 
optimal clustering for this harsh model in polylogarithmic time only. Section 2 gives an 
overview over relevant previous work. Section 3 introduces our model as well as some 
well-known facts. The algorithm is developed and analyzed in Sections 4 and 5. 

2 Related Work 

The problem of hnding a minimum dominating set was proven to be NP-hard. Fur- 
thermore, it has been shown in [3] that the best possible approximation ratio for this 
problem is In A where A is the highest degree in the graph, unless NP has deterministic 
nO(^°si°s'*)-time algorithms. For unit disk graphs, the problem remains NP-hard, but 
constant factor approximations are possible. Several distributed algorithms have been 
proposed, both for general graphs [7,9,10] and the Unit Disk Graph [4,12]. All the above 
algorithms assume point-to-point connections between neighboring nodes and are thus 
unsuitable in the context of initializing radio networks. 

A model similar to the one used in this paper has previously been studied in the 
context of analyzing the complexity of broadcasting in multi-hop radio networks, e.g. 
[2]. A striking difference to our model is that throughout the literature on broadcast in 
radio networks, synchronous wake-up is considered, i.e. all nodes have access to a global 
clock and start the algorithm simultaneously. A model featuring asynchronous wake-up 
has been studied in recent papers on the wake-up problem in single-hop networks [5,8]. 
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In comparison to our model, these papers define a much weaker notion of asynchrony. 
Particularly, it is assumed that sleeping nodes are woken up by a successfully transmitted 
message. In a single-hop network, the problem of waking up all nodes thus reduces to 
analyzing the number of time-slots until one message is successfully transmitted. While 
this definition of asynchrony leads to theoretically interesting problems and algorithms, 
it does not closely reflect reality. 



3 Model 



We model the multi-hop radio network with the well known Unit Disk Graph (UDG). In a 
UDG G = (y, E), there is an edge {rt, u} € i? iff the Euclidean distance between u and 
V is at most 1. Nodes may wake up asynchronously at any time. We call a node sleeping 
before its wake-up, and active thereafter. Sleeping nodes can neither send nor receive 
any messages, regardless of their being within the transmission range of a sending node. 
Nodes do not have any a-priori knowledge about the topology of the network. They 
only have an upper bound n on the number of nodes n = |y| in the graph. While n is 
unknown, all nodes have the same estimate h. As shown in [8], without any estimate of 
n and in absence of a global clock, every algorithm requires at least time l7(n/logn) 
until one single message can be transmitted without collision. 

While our algorithm does not rely on synchronized time-slots in any way, we do 
assume time to be divided into time-slots in the analysis section. This simplification is 
justified due to the trick used in the analysis of slotted vs. unslotted ALOHA [11], i.e., a 
single packet can cause interference in no more than two consecutive time-slots. Thus, 
an analysis in an “ideal” setting with synchronized time-slots yields a result which is 
only by a constant factor better as compared to the more realistic unslotted setting. 

We assume that nodes have three independent communication channels 7”i, I 2 , and 
/a which may be realized with an FDMA scheme. In each time-slot, a node can either 
send or not send. Nodes do not have a collision detection mechanism, that is, nodes are 
unable to distinguish between the situation in which two or more neighbors are sending 
and the situation in which no neighbor is sending. A node receives a message on channel 
T in a time-slot only if exactly one neighbor has sent a message in this time-slot on E. A 
sending node does not know how many (if any at all!) neighbors have correctly received 
its transmission. The variables pk and qk denote the probabilities that node k sends a 
message in a given time-slot on Ti and l 2 , respectively. Unless otherwise stated, we use 
the term sum of sending probabilities to refer to the sum of sending probabilities on 7”i. 
We conclude this section with two facts. The first was proven in [8] and the second can 
be found in standard mathematical textbooks. 

Fact 1. Given a set of probabilities p\ . . .pn with \/i : pi G [0, ^], the following in- 
equalities hold: {1/ 4)^'==^^'° < nr=i(i~Pfc) — 



Fact 2. For all n, t, with n > 1 and \t\ < n, e* (l — t^/n) < (l-|-t/n)" < e*. 
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Algorithm 1 Dominator Algorithm 

decided := dominator := false; 
upon wake-up do: 

1: for j := 1 to (5 • [log n] by 1 do 

2: if message received in current time-slot then decided := true; fi 

3: end for 

4: for j ;= [log h] to 0 by —1 do 
5: p:=l/(2^+'3); 

6: for i := 1 to 5 by 1 do 

7: 6™ := 0; fof ^ := 0; ■- 0; 

8: if not decided then 

9: ■.= 1 , with probability p; 

10: if = 1 then dominator := true; 

11: else if message received in current time-slot then decided := true; 

12 : 11 

13: end if 

14: if dominator then 

15: := 1 , with probability g; b\^^ := 1 , with probability q/ logh; 

16: end if 

17: if b[^^ = 1 then send message on Id li 

18: if = 1 then send message on l 2 li 

fS") 

19: if fo: = 1 then send message on / a li 

20: end for 

21: end for 

22: if not decided then dominator := decided := true; li 
23: if dominator then 
24: loop 

25: send message on 7*2 and /a, with probability q and q/ log h, respectively; 

26: end loop 

27: end if 



4 Algorithm 

A node starts executing the dominator algorithm (Algorithm 1) upon waking up. In the 
first phase (lines 1 to 3), nodes wait for messages (on all channels) without sending 
themselves. The reason is that nodes waking up late should not interfere with already 
existing dominators. Thus, a node first listens for existing dominators in its neighborhood 
before actively trying to become dominator itself. 

The main part of the algorithm (starting in line 4) works in rounds, each of which 
contains S time-slots. In every time-slot, a node sends with probability p on channel Fi. 
Starting from a very small value, this sending probability p is doubled (lines 4 and 5) in 
every round. When sending its first message, a node becomes a dominator and, in addition 
to its sending on channel Ti, it starts sending on channels I 2 and F^ with probability q 
and q/ log n, respectively. Once a node becomes a dominator, it will remain so for the 
rest of the algorithm’s execution. For the algorithm to work properly, we must prevent the 
sum of sending probabilities on channel Ti from reaching too high values. Otherwise, 
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too many collisions will occur, leading to a large number of dominators. Hence, upon 
receiving its first message (without collision) on any channel, a node becomes decided 
and stops sending on Fi. Being decided means that the node is covered by a dominator 
and consequently, the node stops sending on r'l. 

Thus, the basic intuition is that nodes, after some initial listening period, compete to 
become dominator by exponentially increasing their sending probability on Ti. Chan- 
nels J 2 and /a then ensure that the number of further dominators emerging in the 
neighborhood of an already existing dominator is bounded. 

The parameters q, j3, and 5 of the algorithm are chosen as to optimize the results 
and guarantee that all claims hold with high probability. In particular, we define q := 
(2^ • [log fi] 5 := [log(fi) / log(503/502)] , and (3 := 6. The parameter 5 is chosen 
large enough to ensure that with high probability, there is a round in which at least one 
competing node will send without collision. The parameter q is chosen such that during 
the “waiting time-slots”, a new node will receive a message from an existing dominator. 
Finally, /3 maximizes the probability of a successful execution of the algorithm. The 
algorithm’s correctness and time-complexity (defined as the number of time-slots of a 
node between wake-up and decision) follow immediately: 

Theorem 1. The algorithm computes a correct dominating set. Moreover, every node 
decides whether or not to become dominator in time O(log^fi). 

Proof. The first for-loop is executed 6 ■ [logfi] times. The two nested loops of 
the algorithm are executed [log fi] -I- 1 and <5 times, respectively. After these two 
loops, all remaining undecided nodes decide to become dominator. 



5 Analysis 

In this section, we show that the expected number of dominators in the network is within 
0(1) of an optimal solution. As argued in Section 3, we can simplify the analysis by 
assuming all nodes operate in synchronized time-slots. 

We cover the plane with circles Ci of radius r = 1 /2 by a hexagonal lattice shown 

in Figure 1. Let Di be the circle cen- 
tered at the center of Q having radius 
R = 3/2. It can be seen in Figure I 
that Di is (fully or partially) covering 19 
smaller circles Cj. Note that every node 
in a circle Ct can hear all other nodes 
in Ci. Nodes outside Di are not able to 
cause a collision in Ci. 

The proof works as follows. We first 
bound the sum of sending probabilities in 
a circle Ci. This leads to an upper bound 
on the number of collisions in a circle 
before at least one dominator emerges. Next, we give a probabilistic bound on the 
number of sending nodes per collision. In the last step, we show that nodes waking 
up being already covered do not become dominator. All these claims hold with high 




Fig. 1. Circles Ci and Di 
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probability. Note that for the analysis, it is sufficient to assume h = n, because solving 
minimum dominating set for n' < n cannot be more difficult than for n. If it were, the 
imaginary adversary controlling the wake-ups of all nodes could simply decide to let 
n — n' sleep inhnitely long, which is indistinguishable from having n' nodes. 

Definition 1. Consider a circle Ci. Let t be a time-slot in which a message is sent by 
a node v £ Ci on Fi and received (without collision) by all other nodes in Ci. Wfe say 
that circle Ci clears itself in time-slot t. Let to be the first such time-slot. Wfe say that Ci 
terminates itself in time-slot to. For all time-slots t' > to, we call Ci terminated. 



Definition 2. Let s{t) := X^fcec- of sending probabilities on Fi in Ci at 

time t. We define the time slot t\ so that for the time in Ci, we have s{tl — 1) < ^ 
and s(tj) > We further define the Interval Xl ■.= [t^ . . .tl -\- 5 — 1]. 

In other words, is the time-slot in which the sum of sending probabilities in Q 
exceeds ^ for the hrst time and t\ is the time-slot in which this threshold is surpassed 
for the time in Ci. 

Lemma 1. For all time-slots t' € If , the sum of sending probabilities in Ci is bounded 
byUkdCiPk < 3/2^- 

Proof. According to the definition of tf, the sum of sending probabilities 
Pk time — 1 is less than All nodes which are active at time 
tf will double their sending probability pk exactly once in the following S time- 
slots. Previously inactive nodes may wake up during that interval. There are at 
most n of such newly active nodes and each of them will send with the initial 
sending probability in the given interval. In If, we get 

Y.Pk < ^ 

k&Ci k€Ci 

Using the above lemma, we can formulate a probabilistic bound on the sum of sending 
probabilities in a circle Ci. Intuitively, we show that before the bound can be surpassed, 
Ci does either clear itself or some nodes in Ci become decided such that the sum of 
sending prohabilities decreases. 

Lemma 2. The sum of sending probabilities of nodes in a circle Ci is bounded by 
^keC Pk ^ 3/2^ with probability at least 1 — o(^). The bound holds for all Ci in 
G with probability at least 1 — o(h^. 

Proof. The proof is by induction over all intervals If and the corresponding time- 
slots tf in ascending order. Lemma 1 states that the sum of sending probabilities 
in Ci is bounded by ^ in each interval If . In the sequel, we show that in If , 
the circle Ci either clears itself or the sum of sending probabilities falls back 
below ^ with high probability. Note that for all time-slots t not covered by any 
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interval Xf , the sum of sending probabilities is below ^ . Hence, the induction 
over all intervals is sufficient to prove the claim. 

Let t' := be the very first critical time-slot in the network and let I' 
the corresponding interval. If some of the active nodes in Ci receive a message 
from a neighboring node, the sum of sending probabilities may fall back below 
In this case, the sum does obviously not exceed If the sum of sending 

probabilities does not fall back below the following two inequalities hold for 

the duration of the interval X' \ 

^ ^ Pfe < ^ : m Q (1) 

keCi 

0 < ^ Pk < ^ ■ in Cj e D„i ^ j. (2) 

keCj 

The second inequality holds because t' is the very first time-slot in which the 
sum of sending probabilities exceeds Hence, in each Cj € D,, the sum of 
sending probabilities is at most ^ in I'. (Otherwise, one of these circles would 
have reached ^ before circle Ci and t' is not the first time-slot considered). 

We will now compute the probability that Ci clears itself within X' . Circle 
Ci clears itself when exactly one node in Ci sends and no other node in Di \ Ci 
sends. The probability Pq that no node in any neighboring circle Cj £ Di,j ^ i 
sends is 



Pn = 



n n - n ( 4 

jp 



CjGDi kec. 






> 

Lemma 1 



n 

CjeDi 



3 

1 \ 2^ 



> 



3 -I 18 
1 \ 2^ 
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Let Psuc be the probability that exactly one node in Ci sends: 
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The last inequality holds because the previous function is increasing in 

The probability Pc that exactly one node in Ci and no other node in Di sends 
is therefore given by 
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Pc is a lower bound for the probability that Ci clears itself in a time-slot t G I' . 
The reason for choosing /3 = 6 is that this value maximizes Pc- The probability 
Pterm that circle Ci does not clear itself during the entire interval is Pterm 
(1 — 2®/^^/256)'^ < G o(n“^). We have thus shown that within I', the 

sum of sending probabilities in Ci either falls back below ^ or Ci clears itself. 

For the induction step, we consider an arbitrary Due to the induction 
hypothesis, we can assume that all previous such time-slots have already been 
dealt with. In other words, all previously considered time-slots have either lead 
to a clearance of circle C^/ or the sum of probabilities in Ci' has decreased below 
the threshold Immediately after a clearance, the sum of sending probabilities 
in a circle Ci is at most ^ , which is the sending probability in the last round of 
the algorithm. This is true because only one node in the circle remains undecided. 
All others will stop sending on channel Pi. By Lemma 1, the sum of sending 
probabilities in all neighboring circles (both the cleared and the not cleared ones) 
is bounded by ^ in Xf (otherwise, this circle would have been considered before 
tl). Therefore, we know that the bounds (1) and (2) hold with high probability 
and the computation for the induction step is the same as for the base case t' . 

Because there are n nodes to be decided and at most n circles Ci, the number 
of induction steps t\ is at most n. Hence, the probability that the claim holds 
for all steps is at least (l — o(;^)) > 1 — o(^). 

Using Lemma 2, we can now compute the expected number of dominators in each 
circle Ci. In the analysis, we will separately compute the number of dominators before 
and after the termination (i.e., the first clearance) of Ci. 

Lemma 3. Let C be the number of collisions ( more than one node is sending in one time- 
slot on Pi) in a circle Ci. The expected number of collisions in Ci before its termination 
is E [C\ < 6 and C < 7 log n with probability at least 1 — o(n“^). 

Proof. Only channel Ti is considered in this proof. We assume that Ci is not yet 
terminated and we define the following events 

A : Exactly one node in Di is sending 
X : More than one node in Ci is sending 
Y : At least one node in Ci is sending 
Z : Some node in Di \ Ci is sending 

For the proof, we consider only rounds in which at least one node in Ci sends. 
(No new dominators emerge in Ci if no node sends). We want to bound the 
conditional probability P [A | F] that exactly one node v G Ci in Di is sending. 
Using P [y I A] = 1 and the fact that Y and Z are independent, we get 

p [A I y] = p [A I y] • p [z I y] = (1 - p [A I y]) (1 - p [z]) 
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We can now compute bounds for the probabilities P [X], P [F], and P [Zj: 



P [X] = 1 



< 1 
= 1 



n (i-pfc) - n 

keCi keCi 



I \ 

Pk (1 - Pi) 



/ ^ \ Pk 

UJ 



^ Pk- 

k£Ci 









fceCj 



The first inequality for P [X] follows from Fact 1 and inequality (4). Using Fact 
1, we can bound P [F] as P [F] = 1 — HfeeCi ~ Pk) > 1 — In the 

proof for Lemma 2, we have already computed a bound for Pq, the probability 
that no node in \ Ci sends. Using this result, we can write P \Z] as 



p[F]=i- n 11(1 -p^) 

Cj&Di\Ci k^Cj 



< 1 - 
Eq. (3) 




18 



Plugging all inequalities into equation (4), we obtain the desired function 
for P[A I F]. It can be shown that the term P [X] /P [F] is maximized for 
Efcec. Pk=is and thus, P [A | F] = (1 - P [X] /P [F]) • (I - P [Z]) > 0.18. 

This shows that whenever a node in Ci sends, Ci terminates with constant 
probability at least P [A | F]. This allows us to compute the expected number of 
collisions in Ci before the termination of Q as a geometric distribution, E [C] = 
P [A I Y]~^ < 6. The high probability result can be derived as P [C > 71ogn] = 
(l-P[A|F])"‘°s”GO(n-2). 



So far, we have shown that the number of collisions before the clearance of Ci is 
constant in expectation. The next lemma shows that the number of new dominators 
per collision is also constant. In a collision, each of the sending nodes may already be 
dominator. Hence, if we assume that every sending node in a collision is a new dominator, 
we obtain an upper bound for the true number of new dominators. 



Lemma 4. Let D be the number of nodes in Ci sending in a time-slot and let <P denote 
the event of a collision. Given a collision, the expected number of sending nodes is 
Y[D \ d>] G 0(1). Furthermore, P [U < logn | ^] > 1 — o(y). 

Proof. Let m, m < n, be the number of nodes in Ci and N = {1 . . . m}. U is a 
random variable denoting the number of sending nodes in Ci in a given time- 
slot. We define Ak -.= V [D = k] as the probability that exactly k nodes send. 
Let (^) be the set of all fc-subsets of N (subsets of N having exactly k elements) . 



Radio Network Clustering from Scratch 469 



Defining as := Eq6(N) HzgQ can write Ak as 



= X! ( ■ n 

Q6(fc) 



VQ€Q‘6« 



7/i ;/4 

n(i-Pi) = ^'k-\\{^-Pi)- 



( 5 ) 



2=1 



Fact 3. The following recursive inequality holds between two subsequent A'j^: 






k 1- Pi 

i—L 



k-l I ^0—1- 



Proof. The probability Aq that no node sends is Y\^^iV—Pi) and therefore 
A'f^ = 1, which follows directly from equation (5). For general A'^, we have to 
group the terms IliGQ such a way that we can factor out ^5c-i' 



Qg(^)J'6Q »=1 



/ 



Pt 



E n 



pj 



1 — Pi ^ 1 — Pi 



/ 



< iy 



2=1 



Pi 



y TT^ 

1-Pi ^ -*■-*- 1-p,- 



/ 



— 1 Pi 
fc 1 — Pi 

i=l 



E n 



pj 



1 - 



pj 



= iy 

h < ^ 



Pi 



k ^ 1 — Pi 
i=l 



• a; 



fc-i- 



We now continue the proof of Lemma 4. The conditional expected value 
E [D I <P] is E [D I <P] = {i ■'P [D = i \ ‘r\) = ^i where Bi is defined as 
i - P [D = i \ Ip]. For i > 2, the conditional probability reduces to P [D = i | <?] = 
P [D = i] /P [P] . In the next step, we consider the ratio between two consecutive 
terms of 2 ^i- 

Bk-i _ (fc - 1) • P [D = fc - 1 I ^>] _ (fc - 1) • P [D = fc - 1] 

“ k-P[D = k\ P\ “ 



Bk 



k-P[D = k] 



{k-l)-Ak-i (fc-l)-i4;_i 



k ■ Ai 



k-A' 



±k 

It follows from Fact 3, that each term Bk can be upper bounded by 

k 



Bk = 



kA’ 



■ Bk-i < 



■ ( 1 . A' 

' yk Z^i=l l-pi ^k-l 

(fc-lM'fc-l ■ FaFt 3 {k - l)Ay_, 

I m f-v m 



■ Bk-i 
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The last inequality follows from Mi ■. pi <1/2 and Pi < 1/2 => < 2pi. 

From the definition of Bk, it naturally follows that B 2 <2. Furthermore, we 
can bound the sum of sending probabilities using Lemma 2 to be less 

than ^ . We can thus sum up over all Bi recursively in order to obtain E [D | ^] : 



E[D\-P] = Y.B, < 2 + Y. 

i—2 i—3 



2 




< 2.11. 



For the high probability result, we solve the recursion of Fact 3 and obtain 
< M (Sfci T^) ■ The probability P+ := P [iD > log n | <?] is 



p+ - X] - X - X 

[log m] /c=[logm] fc=[logm] 



w\T. 



Pi 



\i=l 



l-Pi 



^ X 

k— [log m] 






\ i=l 

[logm] 



< (m — [log m] ) 

Lm. 2 



[log m] ! 



< TO • 2 ■ X 

V i=l 7 



< 



[log m] 



e 0( — 



The last key lemma shows that the expected number of new dominators after the 
termination of circle Ci is also constant. 

Lemma 5. Let A be the number of new dominators after the termination of Ci. Then, 
A G 0(1) with high probability. 



Proof. We define B and Bi as the set of dominators in Di and Ci, respectively. 
Immediately after the termination of Ci, only one node in Ci remains sending 
on channel Pi, because all others will be decided. Because C and D \ <I> are 
independent variables, it follows from Lemmas 3 and 4 that \Bi\ < r'log^n 
for a small constant r'. Potentially, all Cj € Di are already terminated and 
therefore, 1 < |i3| < 19 • rlog^n for r := 19 • r'. We distinguish the two cases 
1 < |-B| < rlogn and rlogn < |i?| < rlog^n and consider channels F 2 and P^ in 
the first and second case, respectively. We show that in either case, a new node 
will receive a message on one of the two channels with high probability during 
the algorithm’s waiting period. 

First, consider case one, i.e. 1 < |B| < rlogn. The probability Pq that one 
dominator is sending alone on channel P 2 is Pq = \B\ ■ q ■ {1 — qy ' . This is a 

concave function in |il|. For \B\ = 1, we get Pq = q = {2^ ■ [logn])“^ and for 
\B\ = rlogn, n > 2, we have 



Pn = 



r logn 
2^ [log n] 



1 - 



> —;e 2^ I 1 — 



Fact 2 



2/5 



2/5 [log n] 

{r/2f^r 

T log n 



r log n — 1 



> 



T 



1 - 



r/2/5 
r logn 



(ra>2) 2/5 






22/5 



) e 0(1) 



T log n 
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A newly awakened node in a terminated circle Ci will not send during the first 
5 ■ [log n] rounds. If during this period, the node receives a message from an 
existing dominator, it will become decided and hence, will not become dominator. 
The probability that such an already covered node does not receive any messages 
from an existing dominator is bounded by Pno < (1 — (2^ • [logn])“^)'^'l^^°®"^ < 
g-(5/2 g 0(n“^). Hence, with high probability, the number of new dominators 
is bounded by a constant in this case. The analysis for the second case follows 
along the same lines (using P^ instead of 12 ) and is omitted. 

Finally, we formulate and prove the main theorem. 

Theorem 2. The algorithm computes a correct dominating set in time 0(/og^n) and 
achieves an approximation ratio o/0(l) in expectation. 

Proof. Correctness and running time follow from Theorem 1. For the approx- 
imation ratio, consider a circle Ci. The expected number of dominators in Ci 
before the termination of Ci is E [D] = E [C] • E [Z? | (P\ € 0(1) by Lemmas 3 and 
4. By Lemma 5, the number of dominators emerging after the termination of Ci 
is also constant. The Theorem follows from the fact that the optimal solution 
must choose at least one dominator in Di. 
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Abstract. For planar graphs, counting the number of perfect matchings 
(and hence determining whether there exists a perfect matching) can be 
done in NC [4,10]. For planar bipartite graphs, finding a perfect matching 
when one exists can also be done in NC [8,7]. However in general planar 
graphs (when the bipartite condition is removed), no NC algorithm for 
constructing a perfect matching is known. 

We address a relaxation of this problem. We consider the fractional 
matching polytope ViG) of a planar graph G. Each vertex of this poly- 
tope is either a perfect matching, or a half-integral solution: an assign- 
ment of weights from the set {0, 1/2, 1} to each edge of G so that the 
weights of edges incident on each vertex of G add up to 1 [6]. We show 
that a vertex of this polytope can be found in NC, provided G has at least 
one perfect matching to begin with. If, furthermore, the graph is bipar- 
tite, then all vertices are integral, and thus our procedure actually finds 
a perfect matching without explicitly exploiting the bipartiteness of G. 



1 Introduction 

The perfect matching problem is of fundamental interest to combinatorists, al- 
gorithmists and complexity-theorists for a variety of reasons. In particular, the 
problems of deciding if a graph has a perfect matching, and finding such a 
matching if one exists, have received considerable attention in the field of par- 
allel algorithms. Both these problems are in randomized NC [5,2,9] but are not 
known to be in deterministic NC. (NC is the class of problems with parallel al- 
gorithms running in polylogarithmic time using polynomially many processors.) 
For special classes of graphs, however, there are deterministic NC algorithms. 

In this work, we focus on planar graphs. These graphs are special for the 
following reason: for planar graphs, we can count the number of perfect match- 
ings in NC ([4,10], see also [3]), but we do not yet know how to find one, if one 
exists. This counters our intuition that decision and search versions of problems 
are easier than their counting versions. In fact, for the perfect matching problem 

* Part of this work was done when this author was visiting the Institute of Mathema- 
tical Sciences, Chennai on a summer student programme. 



S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 472-483, 2004. 
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itself, while decision and search are both in P, counting is #P-hard and hence 
is believed to be much harder. But for planar graphs the situation is curiously 
inverted. 

We consider a relaxation of the search version and show that it admits a 
deterministic NC algorithm. The problem we consider is the following: For a 
graph G = (V,E) with \V\ = n vertices and \E\ = m edges, consider the m- 
dimensional space Q™. A point {xi,X 2 , ■ ■ ■ ,Xm) in this space can be viewed as 
an assignment of weight Xi to the ith edge of G. A perfect matching in G is such 
an assignment (matched edges have weight 1, other edges have weight 0) and 
hence is a point in this space. 

Consider the polytope P{G) defined by the following equations. 



Xe>Q V e € A 


(1) 


Xe = l Vv€V 


(2) 



e incident on v 

Clearly, every perfect matching of G (i.e. the corresponding point in Q™) 
lies in V{G). Standard matching theory ( see for instance [6]) tells us that every 
perfect matching of G is a vertex of V{G). In fact, if we denote by M{G) the 
convex hull of all perfect matchings, then Ai{G) is a facet oi'P(G). In the case 
of bipartite graphs, the perfect matchings are in fact the only vertices of P(G); 
V{G)= M{G). Thus for a bipartite graph it suffices to find a vertex of V{G) to 
get a perfect matching. Furthermore, for general graphs, all vertices oiV{G) are 
always half-integral (in the set {0,1/2,!}™). For any vertex w of V{G), if we 
pick those edges of G having non-zero weight in w, we get a subgraph which is 
a disjoint union of a partial matching and some odd cycles. 

For instance. Figure 1 shows a graph where A4(G) is empty, while V(G) has 
one point shown by the weighted graph alongside. Figure 2 shows a graph where 
A4(G) is non-empty; furthermore, the weighted graph in Figure 2(b) is a vertex 
of iP(G) not in M(G). 

We show in this paper that finding a vertex of V(G) is in NC when G is a 
planar graph with at least one perfect matching. If, furthermore, the graph is 
bipartite, then all vertices are integral, and thus our procedure actually finds a 
perfect matching, without explicitly exploiting the bipartiteness of G. 





G A vertex of P(G) 

not in M(G) 



Fig. 1. An Example Graph with 
empty A4(G), and a point in ViG) 



Fig. 2. An Example Graph with non-empty 
Ai{G), and a point in V{G) — Ai{G) 
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Our approach is as follows. In [7] , an NC procedure is described to find a point 
p inside V{G), using the fact that counting the number of perfect matchings in 
a planar graph is in NC. Then, exploiting the planarity and bipartiteness of the 
graph, an NC procedure is described to find a vertex of V{G) by constructing 
a path, starting from p, that never leaves V{G) and that makes measurable 
progress towards a vertex. We extend the same approach for general planar 
graphs. In fact, we start at the same point p found in [7]. Then, using only 
planarity of G, we construct a path, starting from p and moving towards a 
vertex of V{G). Our main contribution, thus, is circumventing the bipartite 
restriction. We prove that our algorithm indeed reaches a vertex of V{G) and 
can be implemented in NC. 

This paper is organised as follows. In Section 2 we briefly describe the 
Mahajan-Varadarajan algorithm [7]. In Section 3 we describe our generalisation 
of their algorithm, and in the subsequent two sections we prove the correctness 
and the NC implementation of our method. 

2 Reviewing the Mahajan-Varadarajan Algorithm 

The algorithm of Mahajan & Varadarajan [7] is quite elegant and straight- 
forward and is summarised below. 

For planar graphs, the number of perfect matchings can be found in NC [4, 
10]. Using this procedure as a subroutine, and viewing each perfect matching as 
a point in Q™, obtain the point p which is the average of all perfect matchings; 
clearly, this point is inside Ai{G). Now construct a path from p to some vertex 
of A4(G), always staying inside M{G). 

The Algorithm. 

— Find a point p € A4(G): Ve € E, assign Xe = (#G — #Ge)/#G, where #G 
denotes the number of perfect matchings in G and Ge denotes the graph 
obtained by deleting edge e from G. Delete edges of 0 weight. 

— While the graph is not acyclic, 

• Get cf edge-disjoint cycles (where / is the number of faces in a planar 
embedding of G) using the fact that the number of edges is less than 
three times the number of vertices. (Here c is a constant c = 1/24. We 
consider that subgraph of the dual containing faces with fewer than 12 
bounding edges. A maximal independent set in this graph is sufficiently 
large, and gives the desired set of edge-disjoint cycles.) Since the graph 
is bipartite, all the cycles are of even length. 

• Destroy each of these cycles by removing the smallest weight edge in it. 
To stay inside the polytope, manipulate the weights of the remaining 
edges as follows: in a cycle G, if e is the smallest weight edge with 
weight Xe, then add Xg to the weight of edges at odd distances from e 
and subtract Xg from the weights of edges at even distances from e. Thus 
the edge e itself gets weight 0. Delete all 0-weight edges. 
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— Finally we get an acyclic graph and we are still inside the poly tope. It’s easy 
to check that any acyclic graph inside A4{G) must be a perfect matching 
(integral) . 

Since we are destroying cf faces each time, within log / iterations of the while 
loop we will end up with an acyclic graph. Hence, for bipartite planar graphs 
finding a perfect matching is in NC. 

3 Finding a Half-Integral Solution in a Planar Graph in 
NC 

The following result is a partial generalization of the previous result. 

Theorem 1. For planar graphs, a vertex of the fractional matching polytope 
V{G) (i.e. a half-integral solution to the equations defining V{G), with no even 
cycles) can be found in NC, provided that the perfect matching polytope A4(G) 
is non-empty. 

Our starting point is the same point p computed in the previous section; 
namely, the arithmetic mean of all perfect matchings of G. Starting from p, we 
attempt to move towards a vertex. The basic strategy is to find a large set S 
of edge-disjoint faces. Each such face contains a simple cycle, which we try to 
destroy. Difficulties arise if the edges bounding the faces in S do not contain 
even length simple cycles, since the method of the previous section works only 
for even cycles. We describe mechanisms to be used successively in such cases. 

We first describe some basic building blocks, and then describe how to put 
them together. 

Building Block 1: Simplify, or Standardise, the graph G. 

Let G be the current graph, lei x : E — > Q be the current assignment of 
weights to edges, and let y be the partial assignment finalised so far. The final 
assignment Is y : E — {0, 1/2, 1}. 

Step 1.1. For each e = {u,v) € E{G), if Xe = 0, then set ?/e = 0 and delete e 
from G. 

Step 1.2. For each e = {u,v) G E{G), if Xe = 1, then set j/e = 1 and delete u 
and v from G. 

(This step ensures that all vertices of G have degree at least 2.) 

Step 1.3. Obtain connected components of G. 

If a component is an odd cycle C, then every edge on C must have weight 
1/2. For each e € C, set j/e = 1/2. Delete all the edges and vertices of C 
from G. 

If a component is an even cycle G, then for some 0 < a < 1, the edges on 
G alternately have weights a and 1 — a. For each e G C , li x^, = a then set 
= 1 and if Xe = 1 — a then set ?/e = 0. Delete all the edges and vertices of 
G from G. 
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Step 1.4. Let E' be the set of edges touching a vertex of degree 2 in G. Con- 
sider the subgraph of G induced by E'\ this is a disjoint collection of paths. 
Collapse each such even path to a path of length 2 and each such odd path 
to a path of length 1, reassigning weights as shown in Figure 3. Again, we 
stay within the polytope of the new graph, and from any assignment here, 
a point in V{G) can be recovered in a straightforward way. 

This step ensures that no degree 2 vertex has a degree 2 neighbour. 




Fig. 3. Transformation assuring that no two consecutive vertices are of degree 2 



Step 1.5. For each v € V{G), if v has degree more than 3, then introduce some 
new vertices and edges, rearrange the edges touching v, and assign weights as 
shown in Figure 4. This assignment in the new graph is in the corresponding 
polytope of the new graph, and from any assignment here, a point in V{G) 
can be recovered in a straightforward way. (This gadget construction was in 
fact first used in [1].) 

This step ensures that all vertices have degree 2 or 3. 





Fig. 4. Transformation to remove vertices of degree greater than 3 
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Note that Steps 1.4 and 1.5 above change the underlying graph. To recover 
the point in V{G) from a point in the new graph’s polytope, we can initially 
allocate one processor per edge. This processor will keep track of which edge in 
the modified graph dictates the assignment to this edge. Whenever any trans- 
formation is done on the graph, these processors update their respective data, 
so that recovery at the end is possible. 

We will call a graph on which the transformations of building block 1 have 
been done a standardised graph. 

Building Block 2: Process an even cycle. This is as in [7], and is described in 
Section 2. 

Building Block 3: Process an odd cycle connected to itself by a path. Let C be 
such an odd cycle, with path P connecting C to itself. We first consider the case 
when P is a single edge, i.e. a chord. The chord (u, v) cuts the cycle into paths 
Pi,P 2 - Let Ci denote the cycle formed by Pi along with the chord {u, v). Exactly 
one of Ci,C 2 is even; process it as in Building Block 2. 

If instead of a chord, there is some path P„ „ connecting u and v on C, the 
same reasoning holds and so this step can still be performed. 

Building Block 4: Process a pair of edge-disjoint odd cycles connected by a path. 

Let Cl and C 2 be the odd cycles and P the path connecting them. Note that 
if G is standardised, then P cannot be of length 0. Let P connect to Ci at u 
and to C 2 at v. Then the traversal of Ci beginning at u, followed by path P 
going from u to v, then the traversal of C 2 beginning at v, followed by the path 
P going from u to u, is a closed walk of even length. We make two copies of P, 
one for each direction of traversal. For edge e on P, assign weight Xe/2 to each 
copy. Now treating the two copies as separate, we have an even cycle which can 
be processed according to building Block 2. For each edge e € P, its two copies 
are at even distance from each other, so either both increase or both decrease in 
weight. It can be seen that after this adjustment, the weights of the copies put 
together is still between 0 and 1. 

This step is illustrated in Figure 5. The edge on the path has weight a is split 
into two copies with weight a/2 each. The dotted edge is the minimum weight 
edge; thus w < a/2. 

The Algorithm. The idea is to repeatedly identify large sets of edge-disjoint 
faces, and then manipulate them, in the process destroying them. The faces are 
identified as in [7] , and a simple cycle is extracted from each face. Even cycles are 
processed using Building Block 2. By the building blocks 3 and 4, odd cycles can 
also be processed provided we identify paths connecting the cycles to themselves 
or other cycles. However, to achieve poly logarithmic time, we need to process 
several odd cycles simultaneously, and this requires that the odd cycles and the 
connecting paths be edge-disjoint. 

We use the following definition: A path P is said to be a 3-bounded path if 
the number of internal vertices of P with degree 3 is at most 1. Note that in a 
standardised graph, a 3-bounded path can have at most 3 internal vertices. 
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Fig. 5. Manipulating a closed walk of even length 



The algorithm can be described as follows: 

1. Find a point p £ M{G) and initialise accordingly. 

2. Standardise G (Building block 1; this builds a partial solution in y). 

3. While G is not empty, repeat the following steps, working in parallel on each 
connected component of G: 

a) Find a collection S of edge-disjoint faces in G, including at least 1/24 of 
the faces from each component. Extract a simple cycle from the edges 
bounding each face of S, to obtain a collection of simple cycles T. 

b) Process all even cycles (Building block 2). Remove these cycles from T. 
Re-standardise. 

c) Define each surviving cycle in T to be a cluster. (At later stages, a cluster 
will be a set of vertices and the subgraph induced by this subset.) 

While T is non-empty, repeat the following steps: 

i. Construct an auxiliary graph H with clusters as vertices. H has 
an edge between clusters D\ and D 2 if there is a 3-bounded path 
between some vertex of Di and some vertex of D 2 in G. 

ii. Process all clusters having a self-loop in H. (Building Block 3). Re- 
move these clusters from T. Re-standardise. 

iii. Recompute H . In H , find a maximal matching. Each matched edge 
pairs two clusters, between which there is a 3-bounded path in G. 
In parallel, process all these pairs along with the connecting path 
using Building Block 4. Remove the processed clusters from T and 
re-standardise. 

iv. “Grow” each cluster: if D is the set of vertices in a clus- 
ter, then first add to D all degree-2 neighbours of D, 
then add to D all degree-3 neighbours of D. That is, 
D = DVJ {v \ d{v) = 2 A3 u £ D, {u, v) € E}, 

D = DVJ {v \ d{v) = 3 A3u £ D, (u, v) £ E}. 

4. Return the edge weights stored in y. 
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4 Correctness 

The correctness of the algorithm follows from the following series of lemmas: 

Lemma 1. The clusters and 3-bounded paths processed in Step 3(c) are vertex- 
disjoint. 

Proof. Each iteration of the while loop in Step 3(c) operates on different parts of 
G. We show that these parts are edge-disjoint, and in fact even vertex-disjoint. 
Clearly, this is true when we enter Step 3(c); the cycles are edge-disjoint by our 
choice in Step 3(a), and since G has maximum degree 3, no vertex can be on 
two edge-disjoint cycles. Changes happen only in steps 3(c) (ii) and 3(c) (iii); we 
analyze these separately. 

Consider Step 3(c) (ii). If two clusters Di and D 2 have self-loops, then there 
are 3-bounded paths pi from Di to Di, i = 1,2. If these paths share a vertex 
V, it can only be an internal vertex of p\ and p2, since the clusters were vertex- 
disjoint before this step. In particular, v cannot be a degree-2 vertex. But since 
G is standardized, deg(u) is then 3, which does not allow it to participate in 
two such paths. So pi and p 2 must be vertex-disjoint. Thus processing them in 
parallel via building block 4 is valid. Processing clusters with self-loops merely 
removes them from T; thus the clusters surviving after Step 3(c) (ii) continue to 
be vertex-disjoint. 

Now consider Step 3(c) (iii). Suppose cluster Di is matched to D 2 via 3- 
bounded path p, D 3 to D 4 via 3-bounded path p. Note that Di yf Dj for i yf j, 
since we are considering a matching in H . Thus, by the same argument as above, 
the paths p and p must be vertex-disjoint. Thus processing them in parallel via 
building block 4 is valid. Processing matched clusters removes them from T; the 
remaining clusters continue to be vertex-disjoint. 

Since Step 3(c) (iii) considers a maximal matching, the clusters surviving are 
not only vertex-disjoint but also not connected to each other by any 3-bounded 
path. □ 

Lemma 2. Each invocation of the while loop inside Step 3(c) terminates in 
finite time. 

Proof. To establish this statement, we will show that clusters which survive 
Steps 3(c) (ii) and 3(c) (iii) grow appreciably in size. In particular, they double in 
each iteration of the while loop. Clearly, clusters cannot double indefinitely while 
remaining vertex-disjoint, so the statement follows. In fact, our proof establishes 
that the while loop in Step 3(c) executes O(logn) times on each invocation. 

Let G denote the graph at the beginning of Step 3(c). Consider a cluster at 
this point. Let Dq be the set of vertices in the cluster. Consider the induced 
subgraph GDq on Dq. Notice that each such GDq contains exactly one cycle, 
which is an odd cycle extracted in Step 3(a). 

We trace the progress of cluster Dq. Let Di denote the cluster (or the asso- 
ciated vertex set; we use this notation interchangeably to mean both) resulting 
from Dq after i iterations of the while loop of Step 3(c). If Dq does not survive 
i iterations, then Di is empty. 
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For any cluster D, let 3-size (D) denote the number of vertices in D whose 
degree in G is 3. Let D' denote the vertices of D whose degree in G is 3 but 
degree in Z? is 1. 

We establish the following claim: 

Claim. For a cluster Dq surviving i + 1 iterations of the while loop of Step 3(c), 
GDi contains exactly one cycle, and furthermore, 

lA'I > L2*-'J 

3-size(Gi) > 2® 

Proof of Claim: As mentioned earlier, GDq contains exactly one cycle. Thus 
3-size|I?Q| = 0 In fact, each GDj, j < i contains just this one cycle, because 
if any other cycle were present in GDj, then a self-loop would be found at the 
{j + l)th stage and the cluster would have been processed and deleted from T 
in Step 3(c) (ii); it would not grow (i + 1) times. 

It remains to establish the claims on the sizes of Di and G'. We establish 
these claims explicitly for * < 1, and by induction for i > 1. 

Consider i = 0. Clearly, 3-size(A) > 2° = 1, and [2“^J = 0. 

Now consider i = 1. We know that Dq has gone through two “Grow” phases, 
and that GDi has only one cycle. Notice that each degree 3 vertex in Dq con- 
tributes one vertex outside Dq] if its third non-cycle neighbour were also on 
the cycle, then the cycle has a chord detected in Step 3(c) (ii) and Dq does 
not grow even once. In fact, since Dq grows twice, the neighbours are not only 
outside the cycle but are disjoint from each other. Thus for each vertex con- 
tributing to 3-size(Go), one degree-3 vertex is added to Di and these vertices 
are distinct. Thus all these vertices are in D[ giving = 3-size (G q) > 1, and 
3-size(Z?i) = 3-size(Z?o) -I- \D[\ > 2. 

To complete the induction, assume that the claim holds for i — 1, where z > 1. 
In this case, [2*“^J = 2*“^. Thus 3-size(A-i) > 2*“^, and |Z?(_i| > 2*“^. 

Each u € D'^_i has two neighbours, ui and U 2 , not in Di-i. These vertices 
contributed by each member of D[_.^ must be disjoint, since otherwise A-i 
would have a 3-bounded path to itself and would be processed at the zth stage; 
it would not grow the zth time. Furthermore, if ui is of degree 2, let u\ denote its 
degree-3 neighbour other than u] otherwise let u'i = ui. By the same reasoning, 
the vertices u'^ , u '2 contributed by each u G D[_.^ must also be disjoint. So 2\D[_t^ \ 
vertices are added to Di-\ in obtaining Di. All these new vertices must be in 
D[ as well, since otherwise Di would have a 3-bounded path to itself and would 
be processed at the (z -I- l)th stage; it would not grow the (z -I- l)th time. Hence 
\D'f\ = 2|£»'_J > 2-2*-2 = 2*-C 

Every degree-3 vertex of continues to be in Di and contributes to 3- 
size(A)- Furthermore, all the vertices of D[ are not in Di-i and also contribute 
to 3-size(A)- Thus 3-size(A) = 3-size(A-i) + \D'i\ > 2®“^ -|- 2*“^ = 2L □ 

□ 
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Lemma 3. The while loop of Step 3 terminates in finite time. 

Proof. Suppose some iteration of the while loop in Step 3 does not delete any 
edge. This means that Step 3(b) does nothing, so S has no even cycles, and Step 
3(c) deletes nothing, so the clusters keep growing. But by the preceding claim, 
the clusters can grow at most O(logn) times; beyond that, either Step 3(c) (ii) 
or Step 3(c) (iii) must get executed. 

Thus each iteration of the while loop of Step 3 deletes at least one edge from 
G, so the while loop terminates in finite time. 



Lemma 4. After step 2, and after each iteration of the while loop of Step 3, we 
have a point inside V{G). 

Proof. It is easy to see that all the building blocks described in Section 3 preserve 
membership in V{G). Hence the point obtained after Step 2 is clearly inside 
V{G). During Step 3, various edges are deleted by processing even closed walks. 
By our choice of S, the even cycles processed simultaneously in Step 3(b) are 
edge-disjoint. By lemma 1, the even closed walks processed simultaneously in 
Steps 3(c) (ii) and 3(c) (iii) are edge-disjoint. Now all the processing involves 
applying one of the building blocks, and these blocks preserve membership in 
V{G) even if applied simultaneously to edge-disjoint even closed walks. The 
statement follows. □ 

Lemma 5. When the algorithm terminates, we have a vertex ofV{G). 

Proof. When G is empty, all edges of the original graph have edge weights in 
the set {0, 1/2, 1}. Consider the graph H induced by non-zero edge weights ye. 
From the description of Building Block 1, it follows that is a disjoint union 
of a partial matching (with edge weights 1) and odd cycles (with edge weights 
1/2). Such a graph must be a vertex of V{G) (see, for instance, [6]). □ 

5 Analysis 

It is clear that each of the basic steps of the algorithm runs in NC. The proof 
of Lemma 2 establishes that the while loop inside Step 3(c) runs 0(log n) times. 
To show that the overall algorithm is in NC, it thus suffices to establish the 
following: 

Lemma 6. The while loop of Step 3 runs 0(log n) times. 

Proof. Let F be the maximum number of faces per component of G at the 
beginning of step 3. We show that F decreases by a constant fraction after each 
iteration of the while loop of step 3. Since F = 0{n) for planar graphs, it will 
follow that the while loop executes at most O(logn) times. 

At the start of Step 3, connected components of G are obtained, and they 
are all handled in parallel. Let us concentrate on any one component. Within 
each component, unless the component size (and hence number of faces / in the 
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embedding of this component) is very small, 0(1), a set of 0(/) edge-disjoint 
faces (in fact, //24 faces) and 0(/) edge-disjoint simple cycles can be found in 
NC. This is established in Lemma 3 of [7], which basically shows that a maximal 
independent set amongst the low-degree vertices of the dual is the required set 
of faces. 

So let T be the set of edge-disjoint faces obtained at the beginning of step 
3. If |T| < //24, then the component is very small, and it can be processed 
sequentially in 0(1) time. Otherwise, note that after one iteration of the while 
loop of Step 3, r is emptied out, so all the clusters in T get processed. A single 
processing step handles either one cluster (cluster with self-loop) or two (clusters 
matched in H), so at least |T|/2 processing steps are executed (not necessarily 
sequentially) . 

Each processing step deletes at least one edge. Let the number of edges 
deleted be A: > |r|/2. Of these, ki are not bridges at the time when they are 
deleted and k 2 = k — k\ are bridges when they are deleted. Each deletion of 
a non-bridge merges two faces in the graph. Thus if k\ > |T|/4, then at least 
I?’ 1/4 > //96 faces are deleted; the number of faces decreases by a constant 
fraction. If k 2 > |T|/4, consider the effect of these deletions. Each bridge deleted 
is on a path joining two clusters. Deleting it separates these two clusters into 
two different components. Thus after k 2 such deletions, we have at least ^2 pairs 
of separated clusters. Any connected component in the resulting graph has at 
most one cluster from each pair, and hence does not have at least k 2 clusters. 
Since each cluster contains an odd cycle and hence a face, the number of faces 
in the new component is at most f — k 2 < f — |T|/4 < / — //96. Hence, either 
way, the new F is at most 95F/96, establishing the lemma. □ 



6 Discussion 

We show that if the perfect matching polytope A4(G) of a planar graph is non- 
empty, then finding a vertex of P(G) is in NC. Unfortunately, we still do not 
know any way of navigating from a vertex of 7^(G) to a vertex of A4(G). Note 
that both our algorithm and that of [7] navigate “outwards” , in a sense, from the 
interior of V{G) towards an extremal point. To use this to construct a perfect 
matching in NC, we need a procedure that navigates “along the surface” from 
a vertex of 7^(G) to a vertex of its facet M{G). 

The work of [7] shows that a vertex of A4{G) can be found not only for 
bipartite planar graphs but also for bipartite small O(logn) genus graphs. The 
ideas needed to extend the planar bipartite case to the small-genus bipartite case 
apply here as well; thus we have an NC algorithm for finding a vertex of P(G) 
for O(logn) genus graphs. 

For perfect matchings in general graphs, search reduces to counting, since 
search is in P while counting is #P-hard even under NC reductions. We believe 
that this holds for many reasonable subclasses of graphs as well; searching for 
a perfect matching in a graph reduces, via NC reductions, to counting perfect 
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matchings in a related graph from the same subclass. Proving this at least for 
planar graphs would be an important first step. 
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Abstract. Given a polygonal region with n vertices, a group of searchers 
with vision are trying to find an intruder inside the region. Can the 
searchers find the intruder or can the intruder evade searchers’ detec- 
tion for ever? It is likely that the answer depends on the visibility of the 
searchers, but we present quite a general result against it. We assume 
that the searchers always form a simple polygonal chain within the poly- 
gon such that the first searcher moves along the boundary of the polygon 
and any two consecutive searchers along the chain are always mutually 
visible. Two types of visibility of the searchers are considered: on the one 
extreme every searcher has 360° vision - called an oo-searcher; on the 
other extreme every searcher has one-ray vision - called a 1-searcher. We 
show that if any polygon is searchable by a chain of oo-searchers it is 
also searchable by a chain of 1-searchers consisting of the same number 
of searchers as the oo-searchers. Our proof uses simple simulation tech- 
niques. The proof is also interesting from an algorithmic point of view 
because it allows an O(n^) -time algorithm for finding the minimum num- 
ber of 1-searchers (and thus oo-searchers) required to search a polygon 
[9]. No polynomial-time algorithm for a chain of multiple oo-searchers 
was known before, even for a chain of two oo-searchers. 



1 Introduction 

The visibility-based pursuit-evasion problem is that of planning the motion of 
one or more searchers in a polygonal environment to eventually see an intruder 
that is capable of moving arbitrarily fast [14,5,2,15,4,8,10,6]. A searcher finds 
an intruder if the intruder is within the range of the searcher’s vision sensor 
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at any moment. This problem can model many practical applications such as 
search for an intruder in a house, rescue of a victim in a dangerous area and 
surveillance with autonomous mobile robots. The motion plan calculated could 
be used by robots or human searchers. Some pursuit-evasion problems take the 
intruder(evader)’s strategy into account, which makes searching a game between 
two parties [12,1,13]. 

In general, the goal of the visibility-based pursuit-evasion problem is to com- 
pute a schedule for the searchers so that any intruder will eventually be detected 
by the searchers regardless of the unpredictable trajectory of the intruder, which 
is, however, not always achievable. A polygon is said to be searchable by given 
searchers if there exists such a schedule. Whether a polygon is searchable or 
not depends on many factors such as the shape of the polygon, the number of 
searchers, and the visibility that the searchers have. The visibility of a searcher 
can be defined by the number of flashlights. The k-searcher has k flashlights 
whose visibility is limited to k rays emanating from its position, where the di- 
rections of the rays can be changed continuously at bounded angular rotation 
speed. One can envisage a laser beam as an example of a flashlight, which can 
detect an intruder crossing it. The searcher with the omnidirectional vision that 
can see in all directions simultaneously is called an oo-searcher. 

One of the most interesting results given in the previous work is that any 
polygon (and its variants) searchable by an oo-searcher is also searchable by a 
2-searcher [2,10,11]. This means an oo-searcher and a 2-searcher have the same 
search capability. These results have crucial impact on algorithms by making 
the polynomial-time algorithms developed for a 2-searcher complete for an oo- 
searcher. Solely for an oo-searcher, no polynomial-time algorithm was developed 
(refer to [4]). 

Most results on the equivalence of search capability rely on characterizations 
of the class of searchable polygons (or polygon’s variants). This implies not 
only that the proof is quite complicated but also that a proof for one setting 
cannot be easily adapted for another setting. More specifically, for each search 
environment - a polygon or its variants, rather different arguments were used for 
characterizing the class of searchable ones. 

Recently, Suzuki et al. [16] presented a new approach without resorting to 
characterizations of searchable polygons. They considered a single searcher mov- 
ing along the boundary and showed that an oo-searcher and a 1-searcher have the 
same search capability. Their proof constructs a search diagram of an oo-searcher 
and shows that a 1-searcher can do the same job using topological arguments. 
With the same framework, they also showed that any polygon searchable by an 
oo-searcher with some restrictions is also searchable by a 2-searcher. The latter 
result was independently shown by the authors of the present paper [7]. 

In this paper we study a more general framework and present a simple proof 
on the equivalence of search capability of multiple searchers. We consider a group 
of m searchers which form a chain. Let si, S 2 , ■ ■ ■ , Sm denote m searchers such 
that Si moves along the boundary of the polygon and that Si and s^+i are 
always mutually visible for I < i < m. We call it an m-chain. We consider 
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two types of visibility of the searchers: on the one extreme, the searchers have 
the omnidirectional vision, so the “capture region” is the union of the visibility 
polygons defined by the positions of the searchers. On the other extreme, each 
searcher has only one flashlight (that is, a 1-searcher) such that s^’s flashlight 
must always illuminate during the search for 1 < i < m and only the last 
searcher Sm can control its flashlight arbitrarily; in this case, the “capture region” 
is the union of the line segments between Si and Si+i for 1 < i < m and the 
flashlight of Sm- 




Fig. 1. Examples: 3-chain of oo-searchers and 3-chain of 1-searchers. 



We will show that if a polygon is searchable by an m-chain of oo-searchers it 
is also searchable by an m-chain of 1-searchers. To the knowledge of the authors, 
search equivalence has never been shown for multiple searchers so far. Our proof 
uses a direct simulation; we transform a search schedule of an m-chain of oo- 
searchers into that of 1-searchers. The main idea is very simple and it appears 
to be applied to different settings. 

Our model is motivated by some previous results. LaValle et al. [6] considered 
the case that the 1-searcher moves along the boundary of the polygon and they 
presented an 0(n^)-time algorithm for finding a search schedule of the 1-searcher. 
Their model corresponds to the special case of m = 1 in our model. Efrat et al. [3] 
considered a model similar to ours. In their model, not only the first searcher 
but also the last searcher in the chain should move along the boundary and the 
visible region is the union of line segments between two consecutive searchers. 
They presented an 0(n^)-time exact algorithm for finding the minimum number 
of searchers required to sweep a polygon and an 0(nlogn)-time approximation 
algorithm with additive constant for finding the minimum number of searchers 
required to search a polygon. Lee et al. [9] improved the time complexity to 
O(n^) and these algorithms can easily be modified to deal with the chain of 
1-searchers studied in the present paper. 

Without the chain restriction, the problem of finding the minimum number 
of oo-searchers required to search a polygon is in NP-hard [4] and no polynomial- 
time algorithm is known even for two oo-searchers. Some researchers studied the 
art-gallery-style bound for multiple 1-searchers [15]. 

In applications such as robot surveillance systems, one ray vision can be easily 
implemented by one laser beam (refer to [6] ) . Our equivalence proof shows that 
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in quite a natural setting - a chain of searchers moving inside a polygonal region, 
this simple implementation is as powerful as the omnidirectional vision whose 
implementation appears to be very expensive. 



2 Definitions and Notations 

Let dV denote the boundary of a simple polygon V . The searcher and the in- 
truder are modeled as points that move continuously within V . Let denote 
the position of the intruder at time t > 0. It is assumed that </> : [0,oo) — >■ 7^ is 
a continuous function and the intruder is unpredictable in that he is capable of 
moving arbitrarily fast and his path cj) is unknown to the searcher. 

Two points p and q are visible from each other if the segment pq does not 
intersect the exterior of P. For a point q&P, let Vis{q) denote the set of all 
points in P that are visible from q. 

Let iS = (si, S2, • • • , Sm) denote a chain of m searchers. For a searcher £ S, 
let 7i(t) denote the position of the searcher Si at time t; we assume that 7i(t) : 
[0, oo) — >-7^ is a continuous path. A configuration of S at time t, denoted r{t), 
is the set of points {7^(7) |1 < i < m}. We say that r{t) is legal if 71(f) lies on 
dP and 7i(f) and 7i_|_i(f) are mutually visible at t for 1 < i < to. If P{t) is legal 
during the search, then a set of searchers configured by Pit) is called an m-chain 
of searchers. 

The 00-searcher has the omnidirectional vision. Thus at time t, the m-chain 
of 00-searchers illuminates all points in the union of Vis{'fi{t)) for all i. The 
fc-searcher has fc-flashlights each of which illuminates the points along one ray 
and can rotate at some bounded speed. Formally, the schedule of the fc-searcher 
Si is a (fc -I- l)-tuple (71, ri, • • • , r^) such that Tj : [0, 00) — >■ [0, 2 tt] is a continuous 
function denoting the direction of the j-th ray, which is a maximal line segment 
in P emanating from 7i(f). We define the first boundary point hit by a ray / 
as shot(f). The m-chain of 1-searchers illuminates the points in the union of the 
segment 'yi{t)^i+i{t) for 1 < i < m and the segment "frn{t)shot{f{t)) for the 
flashlight f{t) of Sm at f. 




Fig. 2. Snapshots of a search path of the 2-chain (si,S 2 ) of 00 -searchers. The solid 
curve represents the path of S 2 and the dotted curve along dP denotes the path of si. 



During a search, a point p € P is said to be contaminated at time t (> 0) 
if there exists a continuous function ip such that (p{t) = p and that (p{t') is not 
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illuminated at any t' € [0, t]. A point that is not contaminated is said to be clear. 
A region R QV is contaminated if it contains a contaminated point; otherwise, 
it is clear. In the rest of the paper, dark gray indicates contaminated regions 
and white means clear regions in the illustrations. If V is cleared at some time 
t by an m-chain of oo-searchers moving along (71,72, • • • , 7 m), then the set of 
paths (71,72, • • • , 7 m) is called a search path. A schedule for an m-chain of k- 
searchers(/c > 1) is called a search schedule if V is cleared at some time t during 
the execution of it. Figure 2 depicts a search path of the 2-chain of oo-searchers. 

For points p,q€ dV, let [p, q] denote the connected boundary chain from p to 
q in the clockwise direction. As usual, (p, q) denotes the open chain [p, q]\{p, q}. 



3 Sketch of Proof 

Suppose that a polygon V is searchable by an m-chain of oo-searchers. Let Soo = 
(71, 72, • • • , 7 m) denote a search path for V . To show that P is also searchable by 
an m-chain of 1-searchers we will go through two phases of transformation: first, 
we will transform Sea into a search schedule of an m-chain of fc-searchers, called 
Sk, where k = 0{n), and then Sk will be transformed into a search schedule Si 
of an m-chain of 1-searchers. A basic idea is to simulate the search process using 
the smaller number of flashlights. More detailed outlines follow. 

During the search by 5'oo, P is partitioned into some clear regions and some 
contaminated regions. Thus the boundary of P consists of alternating clear and 
contaminated chains, so that one clear chain lies between two contaminated 
chains. For the time being, let us focus our attention on the boundary of P. If a 
boundary chain is clear and maximal by Sea at time t, we call it an MC-chain 
at t. 

Now we will use an m-chain of fc-searchers for some fc = 0{n) to maintain the 
endpoints of MC-chains, which keep moving on dP with time. The endpoints of 
all MC-chains will be kept track of by some flashlights of the fc-searchers. We call 
them effective flashlights. Each MC-chain is bounded by a pair of effective flash- 
lights and from the viewpoint of the fc-searcher who carries them, one is called 
the left flashlight and the other is the right flashlight. During this simulation, 
MC-chains change with time, thus effective flashlights change accordingly; they 
may appear, move, become non-effective ones or disappear. Detailed explanation 
is in Section 4.1. 

The second phase is to construct Si from the dynamics of effective flashlights 
of fc-searchers. This will be explained in Section 4.2 for the simple case of m = 1 
and in Section 5 for the general case(m > 2). 

Throughout the simulation, we concentrate on the boundary chains, which 
may imply that we deal only with the intruders moving on the boundary of the 
polygon. In the following sections, however, we will describe the movements of 
effective flashlights and the process of their simulation, by which you can see 
how the inside of the polygon is cleared along with its boundary. 
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4 Transformation with a Single Searcher 

In this section, we consider a single searcher only(m = 1). This will provide a 
basis of the later proof for the general case(m > 2), which is more involved. 

Suppose that one oo-searcher is searching a polygon from the boundary. First, 
we simulate it using a fc-searcher, for some k = 0 (n). Let 7 denote the search 
path of the oo-searcher and let Sk denote a schedule of the fc-searcher. We will 
show how to control the effective flashlights in Sk, and then we construct a search 
schedule Si of the 1-searcher. 



4.1 Scheduling a fe- Searcher 

During the search following 7, MC-chains and contaminated chains appear alter- 
nately on dP. We aim to keep track of endpoints of MC-chains by the flashlights 
of the /c-searcher so that we can replace the oo-searcher by the fc-searcher. Let 
the path of the fc-searcher be exactly same as that of the oo-searcher, that is 7. 
Let 7(f) be the position of the /c-searcher at time t. Now we should specify the 
movements of the flashlights of the /c-searcher. 

Let us focus on an MC-chain C{t) = [a{t),b{t)] at time t, where a{t) and b{t) 
are functions denoting the endpoints of C{t). Now we consider the dynamics of 
two effective flashlights associated with C{t). Suppose the MC-chain appears at 
ti for the first time. Then C'(ti) is entirely visible from the searcher (see the thick 
chain in Figure 3a). To simulate C'(ti), the /c-searcher turns on two flashlights, 
aims them at some point in C(ti), and rotates them in opposite directions until 
the shots of the flashlights reach a{ti) and 6(ti), respectively. Then these two 
flashlights are effective ones for C. When a{t) and b{t) move continuously on 
dV, the shots of the two effective flashlights move exactly same as they do (see 
Figure 3b). 

Now consider the cases when an endpoint of the MC-chain changes abruptly 
- not continuously. First suppose the chain C is extended abruptly, specifically 
augmented by C . We can handle this extension by creating C' and merging it 
with C. Without loss of generality, assume that C" is right to C (the left case is 
symmetric). Then the right flashlight of C will meet the left flashlight of C . The 
/c-searcher turns off two meeting flashlights and keeps two remaining ones (left 
flashlight of C and right one of C) as the effective flashlights for the extended 
chain. 

Secondly, suppose the chain C shrinks abruptly. This is the case when a 
part of the clear chain becomes out of sight suddenly and merged into some 
contaminated chain. In Figure 3c, C(the thick chain) gets shortened when the 
searcher moves from 7 clockwise at time t. The new right endpoint of C should 
be the point in C{t) that is visible from the new position of the searcher, say 7' 
and first met when we traverse dV counterclockwise from the old b{t) (for short 
h). If such a point is not found, this chain C disappears. Assume otherwise, that 
is, such a point, say b', is found. It is easily seen that three points 7, b and b' are 
collinear, because the chain {b' ,b) is invisible from 7' and the searcher can move 
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Fig. 3. (a) C(ti) is visible from (b) The effective flashlight that defines the right 

endpoint of C{t) moves continuously. (Note that a part of C{t) is not visible from 7(f) 
at the moment.) (c) If the searcher moves any further in a clockwise direction, the right 
endpoint of C becomes h' and C shrinks. 



only at some bounded speed. Since the chain (6', h) was clear right before t, we 
can imagine as if the shot of the flashlight jumps from b to b' at t. 

To summarize, each effective flashlight uses two kinds of basic moves besides 
turn-on and turn-off: 

1. Sweep : The shot of flashlight moves continuously along the boundary of the 
polygon (for example, the right endpoint of C{t) in Figure 3b). 

2. Backward jump : The shot of flashlight jumps from the current point b to 
another boundary point b' along the line extending the flashlight, where 
the new shot b' lies in the MC-chain before the jump, so the MC-chain 
shrinks (Figure 3c). (“Backward” means that the endpoint of the MC-chain 
C retreats and C loses a part of clear area.) 

Notice that a backward jump is defined only when one side of the flashlight 
is clear. (If it is the left (or right) flashlight, the right (or left) side is clear.) 

Any movement that can be represented as a sequence of sweeps and backward 
jumps is said to be valid. The life-cycle of an effective flashlight / is as follows: it 
is turned on; / uses valid movements; it is turned off (at merge or disappearance). 
The next lemma plays an important role in later proofs. 

Lemma 1. Suppose a movement of a flashlight is valid with one side of the 
flashlight clear. Then the reverse execution of this movement is also valid pro- 
vided the other side is clear. 

Proof: The reverse execution of a sweep is also a sweep, so it is valid whichever 
side is clear. Consider a backward jump with the left side of the flashlight 
clear (the right side is symmetric). Suppose the shot of a flashlight / jumped 
from b to b' as in Figure 3c. Now assume that / aims at b' currently with the 
right side of / clear. The reverse execution would make the shot jump from 
b' to b. Since b belongs to the clear side before the jump, the new right side 
of / is still clear after the jump. Because we can reverse each basic move, the 
reverse execution of a valid movement is still valid with the opposite side clear. 0 
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4.2 Scheduling a 1-Searcher 

In Sk, the flashlights of the fc-searcher keep track of the endpoints of each MC- 
chain formed by Soo- For convenience, we are going to maintain two special 
MC-chains adjacent to the position of the oo-searcher, j{t) - one on the clock- 
wise side, which is denoted by L, and the other on the counterclockwise side, 
denoted by R. It is possible that the neighborhood of 7(f) on both clockwise 
and counterclockwise sides is clear; in this case, imagine that we simply split the 
clear chain at 7(f). Even in the case that L = R = 7(f), we think as if L and R 
are distinct. Thus L and R exist from the beginning of the search and will be 
merged only when LU R becomes dV. 

The key to the simulation by the 1-searcher is how to handle the merge of 
MC-chains with only one flashlight. Suppose that according to Soo, L is first 
merged at time T with some MC-chain D that is clockwise to L. So L fl U is 
one point at T. Assume that D appeared for the first time at fi, where ti < T. 
Recall that in Sk, L and D are bounded by effective flashlights of the fc-searcher. 
The 1-searcher can merge L and D in the following way (see Figure 4): The 
1-searcher has a single flashlight, say F. First F simulates the right flashlight 
of L for the time interval [0,T] in Sk- Next F simulates the left flashlight of D 
during the time interval [ti,T] in Sk reversely, this second simulation proceeds 
in the time direction opposite to the schedule Sk of the fc-searcher. This reverse 
simulation consists of valid movements by Lemma 1. Finally, F simulates the 
right flashlight of D for the time interval [ti,T] in Sk- In this way, the 1-searcher 
can merge two MC-chains using one flashlight. 




Fig. 4. Merging chains L and D in 5i. 



In the above example, D may have been created by merge of two (or more) 
chains, say Di and D 2 - Still the flashlight F can simulate Di and D 2 recursively 
one by one in the same way as for L and D- The above argument is summarized 
as the following lemma. 

Lemma 2. Suppose L is merged with the ehain D during Sk- Then the 1- 
seareher ean merge L and D using the reverse execution- 

The whole schedule S\ is constructed by applying Lemma 2 repeatedly: the 
1-searcher continuously moves along the path 7 of the fc-searcher either forward 
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or backward according to the time direction in which it is being simulated and 
the flashlight of the 1-searcher merges MC-chains one by one until it amounts 
to dV. During the whole simulation, we keep the left side of the flashlight of the 
1-searcher being clear. 

5 Transformation with Multiple Searchers 

Suppose that a polygon V is searchable by an m-chain of oo-searchers for m > 2. 
Let Soo = ( 71 , 72 , • • • , 7m) denote a search path of the m-chain of oo-searchers. 
Let Sk denote the search schedule of an m-chain of fc-searchers and let Si denote 
that of an m-chain of 1-searchers. We denote the i-th 00 -searcher by Si, the t-th 
fc-searcher by s^, and the i-th 1-searcher by s^. 

Constructing Sk- Sk is constructed in the following way: basically, the configu- 
ration of the chain of /c-searchers is the exactly same as that of oo-searchers. And 
for each z, the fc-searcher Si simulates the 00 -searcher si by maintaining effective 
ffashlights for each MC-chain, as shown in Section 4.1. To be exact, an MC-chain 
here is somewhat different because we are dealing with multiple searchers. Some 
maximal clear chains may be bounded by two effective ffashlights that belong 
to two different searchers, which implies that such chains cannot be maintained 
by a single searcher. Thus, we define an MCi-chain as a maximal clear chain 
being maintained by Si, which is called a local MC-chain. We will use the term 
MC-chain to refer to a globally maximal clear chain. Therefore, an MC-chain 
is composed of one or more local MC-chains and Sk is the aggregation of the 
simulation conducted by Si for all i € {1, 2, • • • , m}. 

Interaction among multiple searchers. Multiple searchers cause more compli- 
cations because of their interaction. Let us discuss some cases that cannot be 
handled by the procedure given in Section 4, which are summarize as the follow- 
ing three situations: 

1. An MC-chain C consists of MCi-chain and MCj-chain for i ^ j- 

2. Two MC-chains are merged and two meeting ffashlights belong to two dif- 
ferent searchers. 

3. An effective flashlight of an MC-chain is switched from Si’s to Sj’s {i ^ j). 

The first and the second case can be handled by some sorts of merging which 
will be explained later, and the third one is called hand-over. Let us call the 
most clockwise point of a local or global MC-chain a cw point. Because a global 
MC-chain is the union of some local MC-chains, the role of effective ffashlights 
can be handed over from one searcher to another, according to the dynamic 
changes of the local MC-chains. Suppose that the cw point of an MC-chain C is 
currently defined by a flashlight / of Si (The counterclockwise case is symmetric). 
If / performs sweep or backward jump in the counterclockwise direction passing 
through or over other cw point, then the cw point of C is switched to the cw 
point X that is first encountered by / moving counterclockwise. If this new cw 
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point X is defined by a flashlight of a searcher Sj , the role of the effective flashlight 
is handed over from Si to Sj. On the other hand, hand-over can be caused by 
a searcher Sk whose flashlight /' is moving clockwise. If /' performs sweep in 
the clockwise direction passing through the flashlight of /, then /' becomes 
the new effective flashlight instead of /. Figure 5 shows the typical examples 
of hand-over. In Figure 5a, if the searcher Si moves upwards, the cw point of 
an MC-chain C which has been defined by the flashlight of Si (whose shot is 
h) is now defined by the flashlight of Sj (whose shot is h'). In Figure 5b, as the 
searcher Sk moves upwards, its flashlight becomes a new effective flashlight for 
the chain C, which induces the extension of C. 

The following observation is important when we construct Si: every time 
that a hand-over occurs, a local MC-chain either changes smoothly or shrinks 
abruptly (in other words, its flashlights are doing sweep or backward jump). 
Thus if the right (or left) endpoint of an MC-chain is handed over from the 
flashlight / of Si to the flashlight /' of Sj, f and /' have an intersection and the 
shot of /' is to the left (or right) of or equal to that of / before the hand-over. 





Fig. 5. Examples of hand-over. 



Constructing Si. Now we will construct the search schedule of 1-searchers, ^i. 
Recall that in S'!, the flashlight of every searcher should be aimed at the neigh- 
boring searcher(on the right side) except With the flashlight of s^, we should 
simulate the dynamics of (global) MC-chains. Roughly speaking, the simulation 
will be done by following the history of each MC-chain and merging one by one. 
As in the case that m = 1, we merge MC-chains one by one in the clockwise 
direction and the flashlight of the last 1-searcher Sm, denoted by F, will keep 
track of each MC-chain. When we simulate a left flashlight of an MC-chain, the 
simulation is done reversely. Before we show how to manage hand-over, merge 
and initialization, let us give the configuration of an m-chain of 1-searchers. 

We assign the configuration of an m-chain of 1-searchers as follows: Let 7 (si) 
denote the position of Si at a given moment. While F simulates Si’s flashlight 
for some i, the 1-searchers are located on a minimum-link path (that is a path 
with the minimum number of turns) tt from 7 (si) to 7 (si). Specifically, if tt 
consists of I segments, we assume that the location of si is the same as that of 
Si and S 2 ,S 3 ,---,s; are located at the turning points of tt and the remaining 
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m — I 1-searchers are located at the position of Si. The existence of such a 
minimum-link configuration with the number of turns at most (m — 2 ) is implied 
by the configuration of oo-searchers. When processing hand-over and merge, the 
1 -searchers will change their configuration accordingly. 

F simulates hand-over as follows: without loss of generality, we fix our at- 
tention to the right endpoint of an MC-chain. Suppose that hand-over occurs 
from flashlight / of Si to flashlight /' of Sj. Right before the simulation of this 
hand-over, the 1-searcher Sm must lie at the position of Si. Now the 1-searcher 
Sm moves to the position of Sj along the shortest path from Si to s^ , aiming F 
at the intersection point, say c, of / and /'. It is easily seen that F uses valid 
movements during the above simulation, since hand-over occurs only when the 
MC-chain shrinks or does not change at the moment of hand-over and the shot 
of /' is to the left of or equal to that of /. (During this simulation, the chain of 
1 -searchers is morphed to the minimum-link configuration from 7 (si) to 7 ( 5 ^) 
and continues the simulation. Such a morphing can be done by using the routine 
described in [9].) In case that hand-over occurs between two flashlights illumi- 
nating the same point, its simulation is a special case of the above one. A merge 
of two MC-chains involving two different searchers can be simulated in a similar 
way to the hand-over because the shots of corresponding flashlights meet at one 
point. 

Finally, when a new MC-chain C appears (in initialization), C should have 
a valid history for simulation. When C appears, every point of this chain must 
be visible from some 00 -searcher. We divide C into a number of sub-chains such 
that each sub-chain is entirely visible from one 00 -searcher. We imagine as if 
each sub-chain was cleared by two flashlights of one fc-searcher and C was made 
by merging these sub-chains. In ^i, it suffices to simulate this imaginary history. 

During the execution of S\, we satisfy the following invariant: if we assume 
that an m-chain of 1 -searchers is oriented from si to Sm, the left side of the 
chain of m flashlights is always clear. Therefore all intruders, not restricted to 
the ones on dV, will be detected by Si. Through the whole simulation process, 
we have the following theorem. 

Theorem 3. Any polygon that is searchable by an m-chain of oo-searchers is 
also searchable by an m-chain of 1-searchers. 



References 

1 . M. Adler, H. Racke, N. Sivadasan, C. Sohler, and B. Vocking. Randomized Pursnit- 
Evasion in Graphs. In Proc. of the 29th Int. Colloquium on Automata, Languages 
and Programming, pages 901-912, 2002. 

2. D. Crass, I. Suzuki, and M. Yamashita. Searching for a mobile intrnder in a 
corridor-the open edge variant of the polygon search problem. Int. J. of Comp. 
Geom. and AppL, 5(4):397-412, 1995. 

3. A. Efrat, L.J. Guibas, S. Har-Peled, D. Lin, J. Mitchell, and T. Murali. Sweeping 
simple polygons with a chain of guards. In Proc. of the 16th Symp. on Discrete 
Algorithm, pages 927-936, 2000. 



Equivalence of Search Capability Among Mobile Guards 495 



4. L.J. Guibas, J.C. Latombe, S.M. Lavalle, D. Lin, and R. Motwani. A visibility- 
based pursuit-evasion problem. Int. J. of Comp. Geom. and AppL, 9(4):471-493, 
1998. 

5. C. Icking and R. Klein. The two guards problem. Int. J. of Comp. Geom. and 
AppL, 2(3):257-285, 1992. 

6. S. M. LaValle, B. H. Simov, and G. Slutzki. An algorithm for searching a polygonal 
region with a flashlight. In Proc. of the 16th Symp. on Comp. Geom., pages 260- 
269, 2000. 

7. Jae-Ha Lee, Sang-Min Park, and Kyung-Yong Chwa. On the polygon-search con- 
jecture. Technical Report TR-2000-157, CS department, KAIST, 2000. 

8. Jae-Ha Lee, Sang-Min Park, and Kyung-Yong Chwa. Searching a polygonal room 
with one door by a 1-searcher. Int. J. of Comp. Geom. and AppL, 10(2):201-220, 
2000 . 

9. Jae-Ha Lee, Sang-Min Park, and Kyung-Yong Chwa. Optimization algorithms for 
sweeping a polygonal region with mobile guards. In Proc. of the 12th Int. Symp. 
on Algo, and Computation, pages 480-492, 2001. 

10. Jae-Ha Lee, Sung Yong Shin, and Kyung-Yong Chwa. Visibility-based pursuit- 
evasion in a polygonal room with a door. In Proc. of the 15th ACM Symp. on 
Comp. Geom., pages 281-290, 1999. 

11. Sang-Min Park, Jae-Ha Lee, and Kyung-Yong Chwa. Visibility-based pursuit- 
evasion in a polygonal region by a searcher. In Proc. of the 28th Int. Colloquium 
on Automata, Languages and Programming, pages 456-468, 2001. 

12. T. D. Parsons. Pursuit-evasion in a graph. In Theory and Applications of Graphs, 
pages 426-441, 1976. 

13. Gunter Rote. Pursuit-evasion with imprecise target location. In Proc. of the 14th 
ACM-SIAM Symp. on Discrete Algorithms, pages 747-753, 2003. 

14. I. Suzuki and M. Yamashita. Searching for a mobile intruder in a polygonal region. 
SIAM J. Comp., 21(5):863-888, 1992. 

15. M. Yamashita, H. Umemoto, I. Suzuki, and T. Kameda. Searching for mobile 
intruders in a polygonal region by a group of mobile searchers. Algorithmica, 
31(2):208-236, 2001. 

16. M. Yamashita, H. Umemoto, I. Suzuki, and T. Kameda. Searching a polygonal 
region from the boundary. Int. J. of Comp. Geom. and AppL, ll(5):529-553, 2001. 



Load Balancing in Hypercubic Distributed Hash 
Tables with Heterogeneous Processors* 



Junning Liu and Micah Adler 

Department of Computer Science, University of Massachusetts, Amherst, MA 
01003-4610, USA. {liujn, micahj@cs.umass.edu 



Abstract. There has been a considerable amount of recent research on 
load balancing for distributed hash tables (DHTs), a fundamental tool in 
Peer-to-Peer networks. Previous work in this area makes the assumption 
of homogeneous processors, where each processor has the same power. 
Here, we study load balancing strategies for a class of DHTs, called hy- 
percubic DHTs, with heterogenous processors. We assume that each pro- 
cessor has a size, representing its resource capabilities, and our objective 
is to balance the load density (load divided by size) over the processors 
in the system. Our main focus is the offline version of this load balancing 
problem, where all of the processor sizes are known in advance. This re- 
duces to a natural question concerning the construction of binary trees. 
Our main result is an efficient algorithm for this problem. The algorithm 
is simple to describe, but proving that it does in fact solve our binary 
tree construction problem is not so simple. We also give upper and lower 
bounds on the competitive ratio of the online version of the problem. 



1 Introduction 

Structured Peer-to-Peer(P2P) systems have been increasingly recognized as 
the next generation application mode of the Internet. Most of them, such as 
CAN [17], Chord [19], Pastry [18] and TAPESTRY [21] etc. are distributed hash 
tables (DHTs), which determine where to store a data item by hashing its name 
to a prespecified address space, and this address space is partitioned across the 
processors of the P2P system in a manner that allows a specific hash address 
to be found relatively easily. The aspect of DHT systems that motivates our 
work is load balancing. This is crucial to a DHT: a major design goal of P2P 
systems is to provide a scalable distributed system with high availability and 
small response time. 

In this paper, we consider load balancing in a DHT consisting of heteroge- 
neous processors. While most previous work has analyzed DHTs under the as- 
sumption that all processors are identical, real DHTs consist of processors with 
significantly different characteristics in terms of computational power, band- 
width availability, memory, etc. One way to extend results for the homogeneous 
case to heterogeneous systems is to use virtual proeessors: powerful processors 

* This work supported by NSF grants EIA-0080119, CCR-0133664, and ITR-0325726. 
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pretend to be multiple less powerful processors. This approach has the draw- 
back that a processor must maintain a lot of pointers to other processors for 
each virtual node it represents. Data Migration is another technique dealing 
with heterogeneous processors. Data migration can be a good approach when 
data is highly dynamic, but is usually unfavorable when processors are highly 
dynamic. When processors arrive and leave frequently, the migration process 
may have a cascading effect of migration [3] that further deteriorates bandwidth 
usage and data consistency. 

This paper tries to balance the load without using virtual servers or migration 
methods. We study one variant of the DHT, the hypercubic hash taWe(HHT), 
a variant of CAN [17] described in [2]. The HHT partitions the address space 
using a full binary tree. The leaves of the tree correspond to the processors of 
the DHT. We assign a 0 to every left branch in the tree, and a 1 to every right 
branch. Each processor stores all hash addresses whose prefix matches the string 
obtained by following the path from the root to the leaf for that processor, and 
thus the fraction of the address space stored at a processor i is l/2^y where k is 
the distance in the tree from i to the root. 

In the hypercubic hash table, the arrival of a new processor i is handled by 
splitting an existing leaf node j. To do so, the leaf node for j is replaced by an 
internal node with two children i and j. The address space is then reallocated 
as appropriate for this change. Deletions are handled in an analogous manner. 
[2] analyzed distributed techniques for choosing which node of the tree to split 
in order to keep the resulting tree as balanced as possible. In particular, the 
objective was to minimize the ratio between the maximum fraction of the address 
space stored at any processor, and the minimum fraction of the address space 
stored at any processor.^ 

For heterogeneous processors, we want to store larger fractions of the address 
space on some of the processors, and thus we no longer want as balanced a tree 
as possible. We assume that every processor i has a single measure of its power, 
which we call its size, and denote by 2^** (it will be more convenient for us to use 
the logarithm of the sizes). Instead of balancing the load across the processors, 
we now wish to construct a tree that balances the load density, where the load 
density on processor i is defined to be Idi = L is the total 

load and h is the height (distance from the root) of node i in the tree. Our goal 
is to minimize the quantity ™ ■ This criteria is a natural extension of the 

load balancing criteria of [2], and characterizes the application’s major require- 
ments. An alternative criteria would be to minimize the maximum load density. 
A drawback to considering only the maximum load density is that it can result 
in processors with a large size not having a large fraction of the address space: 
i.e., powerful processors may be underutilized. In fact, our main result actually 
demonstrates that it is possible to minimize both criteria simultaneously: we 



^ When the number of data items is large compared to the number of processors, and 
the data items have approximately the same size, this will be a good estimate of the 
load balance. The case of heterogeneous processors where the number of data items 
is close to the number of processors has been studied in [5]. 
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describe an algorithm that finds the tree with the minimum load density ratio, 
but this tree also minimizes the maximum load density. 

While the eventual goal of a load balancing technique would be to develop 
a distributed and online algorithm, it turns out that minimizing the load den- 
sity ratio with heterogeneous processors is challenging even in a centralized and 
offline setting. Thus, in this paper we focus on that setting; developing dis- 
tributed algorithms that utilize the techniques we introduce for the centralized 
setting is an interesting and important open problem. The centralized problem 
reduces to the following simple to state algorithmic problem: given a set of nodes 
S = {pi,P 2 , ■ ■ ■ ,Pn} with weights si, . . . , s„, construct a full binary tree with leaf 
set S. Let the density of node i in this tree be Si + U. The binary tree should mini- 
mize the difference between the maximum density of any node and the minimum 
density of any node. This is a quite natural question concerning the construction 
of binary trees, and thus we expect that algorithms for this problem will have 
other applications. 

This algorithmic problem is reminiscent of building optimal source codes; 
for that problem Huffman’s algorithm (see [6] for example) provides an optimal 
solution. In fact, if we use the simpler optimization criteria of minimizing the 
maximum density, then Golumbic’s minimax tree algorithm [11], a variant of 
Huffman’s algorithm, provides the optimal solution. In this variant, when two 
nodes are merged into a single node, the new node has a size equal to the 
maximum of the two merged nodes plus one (instead of the sum of the two nodes 
as in Huffman’s algorithm) . Similarly, if we must maximize the minimum density, 
then using the minimum of the two merged nodes plus one gives the optimal 
solution. What makes our problem more difficult is that we must simultaneously 
take both the maximum density and the minimum density into account. We point 
out that there are simple example inputs for which it is not possible to construct 
a tree that simultaneously minimizes the maximum density and maximizes the 
minimum density. 

Our main result is the introduction and analysis of a polynomial time algo- 
rithm, called the marked-union algorithm, that, given a set of nodes with integer 
weights, finds a tree that minimizes the difference between the maximum den- 
sity and the minimum density. This algorithm is in fact based on the maximin 
version of Golumbic’s algorithm, and as a result also maximizes the minimum 
density (which corresponds to minimizing the maximum load density on any pro- 
cessor). The difference between our algorithm and Golumbic’s is that ours relies 
on an additional tie breaking rule that determines which nodes to merge. This 
tie breaking rule is crucial: Golumbic’s algorithm without the tie breaking rule 
can result in a density difference that is considerably worse than the optimal.^ 

The fact that the marked-union algorithm does in fact minimize the density 
difference is not obvious, and in fact the proof of this is somewhat involved. As 

^ For example, on an input of 2”' — 1 nodes of the same size, if ties are broken arbitrarily, 
then the density difference can be as bad as n — 1, whereas the optimal for this input 
is 1. We also point out that while this tie breaking rule does not reduce the maximum 
load density in the system, it can significant reduce the number of processors that 
must deal with this maximum load density. 
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further evidence that minimizing the density difference is not as easy to deal 
with as the minimax and the maximin optimization criteria, we point out that 
the marked-union algorithm does not always return the optimal solution for 
non-integer inputs, and in fact the complexity of that case is an open problem. 
Nevertheless, the marked-union algorithm does provide a 2-approximation. 

We also provide lower and upper bounds on the competitive ratio of the 
online version of this problem. In this scenario, the algorithm has access to the 
entire current tree, and, on an arrival, must decide which node to split to achieve 
as good load balancing as possible. 

This paper is organized as follows: In Section 2, we define the problem more 
formally and present the marked-union algorithm. In Section 3, we prove the 
optimality of the marked-union algorithm. In Section 4 we describe our results 
for the competitive analysis of the online algorithm. Finally, in Section 5 we 
discuss future work. Due to space limits, most of our proofs can be found in our 
technical report [13]. 



1.1 Related Work 

There has been a number designs on how to build a scalable structured P2P 
system, including CAN [17], HHT [2], Chord [19], Viceroy [14], Pastry [18], 
Tapestry [21], Distance Halving DHT [15], and Koorde [12]. Most of these DHTs 
achieve a log n ratio of load balancing with high probability, although many also 
consider schemes for improving this to 0(1), sometimes at the cost of other 
performance measures. CAN [17] allows a measure of choice to make the load 
more balanced. This technique is analyzed in [2] for the closely related HHT; 
that paper demonstrates that it achieves a constant load balance ratio with high 
probability. All of the work mentioned above considers only the homogenous 
scenarios. 

Work on heterogenous load balancing has just begun. [16] and [1] provide 
heuristics for dealing with heterogenous nodes. In [16], migration and deletion 
of virtual servers is simulated in a system based on Chord. [1] uses the P-Grid 
Structure. There, each node is assumed to know its own best load and the system 
reacts accordingly to balance the load. This provides further motivation for our 
offline algorithm: it demonstrates that determining the optimal allocation for 
the processors currently in the system is a useful tool for load balancing of 
heterogenous P2P systems. 

Perhaps the closest work to ours is [7], which introduces a protocol and proves 
that it balances the heterogenous nodes’ load with high probability for a variant 
of the Chord DHT, assuming that a certain update frequency is guaranteed. 
To the best of our knowledge, there has been no competitive analysis of load 
balancing problems for structured P2P systems. 

There has been a considerable amount of work on Huffman codes and its vari- 
ants, which is related to our problem of constructing binary trees. [20] showed 
that if the input is pre-sorted, the running time of Huffman’s algorithm can be 
reduced to be 0(n). [11] introduced the minimax tree algorithm already men- 
tioned. [10] generalizes the Huffman code to a non-uniform encoding alphabet 
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with the same goal of minimizing the expected codeword length. They gave a 
dynamic programming algorithm that finds the optimal solution for some cases, 
as well as a polynomial time approximation scheme for others. 

There is recently some work studying constrained version of optimization 
problems arising in binary tree construction. [8] studied the problem of restruc- 
turing a given binary tree to reduce its height to a given value h, while at the 
same time preserving the order of the nodes in the tree and minimizing the 
displacement of the nodes. [9] considers the problem of constructing a nearly 
optimal binary search trees with a height constraint. 



2 Problem Statement and Algorithm 

We next provide a more formal description of the problem. Suppose we have m 
different possible processor sizes, Sm > Sm-i > ■ ■ ■ Si > . . . > Si > 0. Each size 
is an integer, and there are rii > 0 processors for size Si. Define a solution tree 
to be a full binary tree plus a bijection between its leaf nodes and the input pro- 
cessors. Let T be any solution tree. Denote the depth of a processor i? in a tree T 
as In, the size of it as sr (the root node has a depth of zero). The density of pro- 
cessor R is dn.R = sr + Ir. ^ The density difference of T is the maximum density 
minus the minimum density. Our goal is to find the optimal solution tree T* that 
achieves the minimum density difference Minimum^ (max/j dr^R — min/{ dr^R). 

For this problem we design the marked-union algorithm. In order to describe 
it, we start with some notation. This algorithm will repeatedly merge nodes. To 
start with, the set of input processors form the set of nodes. We refer to these 
initial nodes as real nodes. Each node maintains a minimum density attribute 
drain = X. We Sometime refer to a node as {x). The real node for a processor has 
the processor’s size as its dmin- 

The algorithm proceeds via a series of unions: an operation that consumes 
two nodes, and produces a virtual node. The nodes it consumes can be real or 
virtual. When a union operation consumes nodes (dimin) and {d 2 min) the new 
node it generates will be (min(dimin, d 2 mm) + !)■ 

The aspect of our algorithm that makes it unique as a variant of Huffman’s 
algorithm is the concept of a marked node. A virtual node is labeled as marked 
when it is formed if either of two conditions hold: (1) the two input nodes that 
create that node have different values of dmin, or (2) either of the two input 
nodes was already marked. The marked-union algorithm will use this marking 
to break ties; this will lead us to the optimal solution. 

We are now ready to describe the algorithm as below: 

Note that all the input nodes start as unmarked real nodes and we will prove 
in Lemma 1 later that there will never be more than one marked node. The 
algorithm’s running time is just 0{N log N), where N = Also, we can 

use it to return a solution tree: treat each real node as a leaf node, and on any 

® We will use the density for the offline analysis, but use the load density for the cost 
function of online competitive analysis. Note their relationship is dr,R = logL — 
log Idr.R 
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The Marked-Union AlgorithiTi: 

(1) Initialization: 

Sort the input nodes in dct^reasitig order ol size. Construct the working 
queue using the nodes in this order (from left to right), 

(2) Processing the nodes 

(3) While (more than one node remains in the working queue) { 

(4) Ltiion the rightmost two nodes, resulting in a new node V(/l); 

(5) Insert. ihe working queue in the following order: 

Pioni left to right, nodes decrease according to for nodes 

with the same d,nin- marked nodes appear to the left of un- 
marked nodes, otherwise, ties a, re broken arbitrariljr 

(6) } end while loop 



Fig. 1. Marked-Union algorithm. 

union operation, the resulting virtual node is an internal node of the tree with 
pointers to the two nodes it consumed. Our main result will be to show that this 
tree is an optimal solution tree, which we call the best tree. 

We also point out that we can also get the maximum density difference of the 
resulting tree at the same time. To do so, we modify the algorithm by adding a 
new attribute to any marked node which stores the maximum density of any node 
in its subtree. If we represent a marked node as V {dmin, dmax) , modify the union 
operation as: 0 (d 2 mm) — {^^^{dlmin'! d2min) 1, max(dimm; d2r7iin) “h 1) 

when dimin d2mini and (d) U (^dmim d^ax) — (min(d, d^in) “h 1, max(d, djnax) ~h 
1) when dimin = d2min = d. As was already mentioned, we do not need to 
define a union operation for the case where two marked nodes are consumed. By 
defining union this way, it is easy to see that dmin will always be the minimum 
density in the subtree rooted at that node, and dmax will always be the maximum 
density. Thus, the density difference of the final marked node dmax — dmin will 
be the maximum density difference of the tree. If the final node is unmarked, all 
nodes have the same density. 



2.1 Marked-Union with the Splitting Restriction 

Before we prove the optimality of the tree resulting from this algorithm, we show 
that this tree can be constructed under the restriction that it is built up using a 
sequence of splitting operations. In other words, we assume that the nodes must 
be inserted into the tree one at a time, and each such insertion must be handled 
by splitting a node of the tree. In order to do so, we first sort the processor by 
size. We assume that the input processor sequence is Pm,Pm-i, ■ ■ ■ ,P 2 ,Pi with 
sizes satisfying Sm > Sm-i > • • ■ > S 2 > si . We next run the marked-union 
algorithm and record the resulting depth I* for each processor pi . 

We then process the whole input sequence from left to right. For the first 
node, we use it as the root node of the solution tree. For each subsequent node 
Pj, choose the leftmost node pi of the already processed input sequence that has 
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a current depth li < I* in the solution tree thus far. As we demonstrate in the 
proof of Theorem 1 in [13], such a node must exist. Split the node pi and add 
Pj as its new sibling. When all processors have been processed, return the final 
tree. Let T* be this tree. 

Theorem 1. In the tree T* , each processor pi will have h = I* . 

3 The Optimality of the Marked-Union Algorithm 

Theorem 2. The marked-union algorithm returns an optimal solution tree 
w.r.t. the minimum density difference. 

This section is devoted to proving Theorem 2. We start with a high level 
overview of this proof. The first step (Section 3.1) is to define the concept of 
rounds. Very roughly, there is one round for each different size processor that 
appears in the input, and this round consists of the steps of the algorithm be- 
tween when we first process nodes of that size and when we first process nodes 
of the next larger size from the original input. Once we have defined rounds, we 
prove a number of properties concerning how the virtual nodes in the system 
evolve from one round to the next. 

The second step of the proof (Section 3.2) defines regular trees, a class of 
solution trees that satisfy a natural monotonicity property. There always exists 
some optimal solution that is a regular tree, and thus we prove a number of 
properties about the structure of these trees. In particular, we examine sets of 
subtrees of a regular tree. There is one set of subtrees for each processor size, and 
these subtrees are formed by examining the lowest depth where that processor 
size appears in the tree. The third step (Section 3.3) uses the results developed in 
the first two steps to show that the tree produced by the marked-union algorithm 
has a density difference that is no larger than any regular tree. We do so via an 
induction on the rounds of the algorithm, where the virtual nodes present after 
each round are shown to be at least as good as a corresponding set of subtrees 
of any regular tree. 



3.1 Analysis of Marked-Union Algorithm 

Now let us do the first step of the proof for Theorem 2. We first introduce 
the definition of Rounds for the purpose of analyzing the algorithm. Note that 
rounds is not a concept used in the algorithm itself. 

Let us divide the algorithm into m rounds, each round contains some consec- 
utive union operations: Roundi’s start point is the start point of the algorithm. 
Roundi’s start point is the end point of Rounds- 1 ; its end point is right before 
the first union operation that will either consume a real node of size 5'i+i or will 
generate a virtual node with a size bigger than 5'i+i. All unions between these 
two points belongs to Rounds . Round™ ’s end point is when we have a single 
node left as the algorithm halts. 
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Lemma 1. There can be at most one marked node throughout one run of the 
algorithm, and the marked node will always he a node with the smallest dmin in 
the working queue. 

Let Ai be dmax — dmin of the marked node before Roundi+i after Rounds , 
and Ai = Q if there is no marked node at that time. We will prove in Lemma 2 
that after each Round all virtual nodes will have the same dmin, Let di be the 
dmin of the virtual nodes after Roundi. 

Lemma 2. Before Roundi+i, after Roundi, we have two possible cases: 

a) Single node {di, di + Ai) , with di < Si+i, Ai > 0; 

b) A set of virtual nodes with the same dmin, which are k{k > 0) un- 
marked virtual nodes V {df) and another marked or unmarked node V {di, di~\-Ai), 
di = 5'i+i, Ai > 0. 



Lemma 3. There are two possible cases for Ai: 
case a): Ai = max(S'i — di-i, Ai-i) 
case b): Ai = 1, Z\i_i = 0, and di-i = Si 

If after Roundi , case b) happens, then we call Roundi a b-round, if not, then 
Roundi is a normal round. Note the first marked node appears at case b) and 
this case can only happen once: there could be at most one b-round throughout 
one run of the algorithm. 

Corollary 1. Ai = 0 ^ L\i_i = Z\i _2 = . . . = zii = 0, dj-i = Sj V j <i. 



Theorem 3 (Gol76). The marked-union algorithm maximizes the final dmin- 



3.2 Definitions and Conclusions About Regular Trees 

Now let us do the second step of the proof of Theorem 2. We first introduce 
the concept of a special class of solution trees then prove some properties of it. 
A regular tree is a Solution Tree which satisfy that for any two processors, if 
Si > Sj then h < Ij. 



Observation 1. There exists an optimal solution tree that is a regular tree. 

For any regular tree, let ?i,max be the maximum depth of the leaf nodes with 
Size Si. By the definition of regular tree, it is easy to see lm,max < lm-i,max < 

■ ■ • ^ l2,max ^ ll,max- 

Then if we look through across the regular tree at depth k^max, the nodes 
we get are some internal nodes or leaf nodes. For each of the internal nodes 
Iv, all its descendants and Iv form a subtree with ly as the root; for each 
leaf node with a processor size of s < S'i, it can also be viewed as a subtree 
of one node, and thus we have a set of subtrees. Define fci_i(fci_i > l)as 
the number of these subtrees. Define the set of these subtrees as Vi-i = 
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{i/i-i.i, t'i- 1 , 2 , ■ ■ ■ , ■ ■ • , Furthermore, we define km = 1: Vm has 

only one subtree, the regular tree itself. 

From its definition, all subtrees of Vi have the same depth in the regular 
treedi_|_i_max(the depth of a subtree means its root’s depth). From the defini- 
tion of Regular Tree and we know all the leaf nodes with size Si lie 

in a depth inclusively between Zi+i,max and ^i,max in the regular tree, thus 
• ■ • , . . . , Vi^ki can be viewed as being constructed from izi-ip, Vi- 1 , 2 , 

. . . , Vi-ij, • . • , i2i-i^ki_i plus the Hi leaf nodes of size Si. Notice that in Vi, we 
don’t have subtrees of Vq below, they will only be formed by leaf nodes of size . 
Also the roots of Vi- 1,2 ■ ■ ■ all have the same relative 

depth of ?i,max — ^i+i,max iu its residing subtree in Vi. This depth is also the max 
relative depth in Vi for real nodes with size Si , and there is at least one leaf 
node with size Si that lies at such a relative depth in one of the subtrees of Vi. 

Let relative density be a node’s density with respect to some subtree of the 
regular tree, i.e., if a node of size Si has a depth of li in a regular tree, and the 
node is also in some subtree with Ir as its root’s depth in the regular tree, then 
the node’s relative density with respect to the subtree is {k — lr) + Si. From above 
we know the set of subtrees in Vi all have the same root depth at the regular 
tree, so we can talk about relative density with respect to a set of subtrees. Let 
dmin,i be the minimum relative density w.r.t. Vi of all the leaf nodes that lie 
in any of the subtrees of Vi. Let dmax.i be the maximum relative density. Let 

^di — dmax.i dmin,i- 

Claim 1. For any regular tree, 

( 1 ) 

^ \Si dmin,i—l\ (2) 

Corollary 2. If after Roundi , we have a single virtual node in our algorithm, 
then compared with the V) of any regular tree on the same input, 

di ^ dmin.i (3) 

3.3 Comparing Marked-Union’s Output to Regular Trees 

Now we have prepared enough for the proof of Theorem 2. By Observation 1, 
there exist at least one regular tree that is optimal. If we can prove no regular 
trees on the same input can beat our algorithm, then we prove marked-union 
algorithm is optimal. So the Lemma below is what we need to prove. 

Lemma 4. For any given input, let min( Ad^) be the minimum Ad^ of all regular 
trees for the same input. For any Roundi, if it is the last round or it is not the 
b-round, then min(zi(ij > Ai. 

Lemma 4 (proof in [13]) is the core part of the whole optimality proof. Com- 
bined with Observation 1, it shows that the marked-union algorithm’s output 
will be no worse than any regular tree in terms of density difference, since there 
exist at least one optimal solution tree which is also a regular tree. We have 
proved that the marked-union algorithm returns an optimal tree, or it is an 
optimal algorithm that minimize the solution tree’s density difference. 
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4 Competitive Analysis 

The marked-union algorithm is designed for the offline case, where we know 
the entire set of available processors before any decisions need to be made. We 
here consider the online case, where the algorithm maintains a current tree, and 
processors arrive sequentially. 

4.1 The Model 

We describe our model in terms of three parties: the adversary (which chooses 
the input), the online player (the algorithm), and the third party. At the 
start of the online problem, the third party specifies some common knowledge, 
known as problem parameters, to both the online player and the adversary. 
In our model, the common knowledge is a set of possible processor sizes. The 
adversary then serves the online player with a sequence of processor sizes. The 
only information the online player has prior to an arrival is that the size of 
that arrival must be in the set of possible sizes given by the third party. The 
adversary also determines when the input sequence halts; the online player only 
sees this when it actually happens. Without a restriction on possible sizes (as 
is imposed by the third party), the adversary can make the performance of 
any online algorithm arbitrarily bad, and thus this aspect of the model gives 
us a way to compare the performance of online algorithms. Furthermore, it 
is reasonable to assume that real DHTs have some knowledge of the arriving 
processor’s sizes, e.g. the range of the sizes. 

Since the load density (as opposed to density) represents our real performance 
metric for DHTs, we use load density in this section. In particular, we use the load 
density ratio of the final solution tree when the adversary decides 

it is time to halt. We point out that instead of the cost of the final solution 
tree, other choices to consider would be a sum of costs or the maximum cost [4]. 
However, for our case, when the tree is growing, it is not in a stable situation yet. 
This transition period should be negligible because it is not a typical working 
state for the system; we are really concerned with the long term effects, for which 
our model is more appropriate. 

4.2 Bounds for Competitive Ratios 

For minimization optimization problems, the competitive ratio [4] for an online 
algorithm is defined as the worst case ratio, over all allowed input sequences, be- 
tween the cost incurred by an online algorithm and the cost by an optimal offline 
algorithm. As was already mentioned, we use the original load density ratio as 
the cost function. Since the node density we used earlier is the log of a node’s load 
density, the log value of competitive ratio is then the difference of density differ- 
ences between the online algorithm and the optimal offline algorithm. Sometimes 
we use the log value of the real competitive ratio for convenience. Define the set 
of input processor sizes which the third party specifies as 5R = {5'i|l < i < m}, 
with Sm > Sm-i >■■■> S 2 > Si- Define AS = Sm — Si, i.e., the largest size 
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difference for the processors specified by the third party. Since the case where 
the third party only specifies a single sized processor is very easy to analyze, 
we only consider the heterogeneous case where m > 1. Define Smax = Sm as 
the largest processor’s size, Smin = as the smallest processor’s size. We first 
provide the lower bound for the competitive ratio. 

Theorem 4. For any deterministic online algorithm for this load balancing 
problem, the competitive ratio is at least 2^^ . 

We point out that the proof in [13] for this is made more complicated by 
the fact that it is described for an adversary that has a further restriction. In 
particular, if we further limit the adversary that it must introduce at least one 
processor of each size given by the third party, then the bound still holds. This 
demonstrates that the lower bound really is a function of the difference between 
the largest processor and the smallest processor. Of course, any lower bound for 
this restricted scenario still holds for the unrestricted scenario. 

We next point out that a very simple algorithm can almost match this lower 
bound. In particular, the lower and upper bounds are within a factor of 2. 

Theorem 5. There is an online algorithm with a competitive ratio of2^^~^^ for 
this load balancing problem. 

From this we also know an upper bound for the best competitive ratio any online 
algorithm could have. 

5 Future Work 

Since our marked-union algorithm efficiently solves the centralized offline opti- 
mization problem, the most important open problem is to design a distributed 
online algorithm. It is possible that algorithms based on ideas from the marked- 
union algorithm may provide some direction. Although the online lower bound 
is somewhat discouraging, it is possible that one can do better using randomiza- 
tion. Also, it seems likely that stochastic models of input sequences are easier 
to deal with. Finally, looking at the case of multiple, independent measures of a 
processor’s power looks like an interesting but challenging question to pursue. 
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Abstract. We consider the generalized version of the stable marriage 
problem where each man and woman’s preference list may have ties. 
Furthermore, each man and woman wishes to be matched to as many of 
acceptable partners as possible, up to his or her specihed quota. Many- 
to-many version of the stable marriage problem has wide applications 
in matching retailers and shopkeepers in e-marketplaces. We investi- 
gate different forms of stability in this context and describe an algo- 
rithm to find strongly stable matchings (if one exists) in the context of 
multiple partner stable marriage problem with ties. In the context of 
Hospital-Residents problem for which only the resident-oriented algo- 
rithm for finding a strongly stable matching is known, this algorithm 
gives a hospital-oriented version (for the same) as well. Furthermore, 
in any instance of many-to-many stable marriage problem with ties, we 
show that the set of strongly stable matchings forms a distributive lattice. 
The results in this paper extend those already known for the one-to-one 
version and many-to-one version (Hospitals-Residents problem) of the 
problem. 



1 Introduction 

The stable assignment problem, first described by Gale and Shapley [3] as 
the stable marriage problem, involves an equal number of men and women 
each seeking one partner of the opposite sex. Each person ranks all members 
of the opposite sex in strict order of preference. A matching is defined to be 
stable if no man and woman who are not matched to each other both prefer 
each other to their current partners. Gale and Shapley showed the existence 
of at least one stable matching for any instance of the problem by giving 
an algorithm for finding it [3]. An introductory discussion of the problem is 
given by Polya et al. [6], and an elaborate study is presented by Knuth [11]. 
Variants of this problem have been studied by Gusfield and Irving [5] amongst 
others, including cases where the orderings are over partial lists or contain 
ties. It is known that a stable matching can be found in each of these cases 
individually in polynomial time (Gale and Sotomayor [4], Gusfield and Irving 
[5]). However, in the case of simultaneous occurrence of incomplete lists and 
ties, the problem becomes NP-hard (Iwama et al. [10]). Baiou and Balinski [1] 
among others studied the many-to-many version of the stable marriage problem. 
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McVitie and Wilson [14] pointed out that the algorithm by Gale and 
Shapley [3], in which men propose to women, generates a male-optimal solution 
in which every man gets the best partner he can in any possible stable matching 
and every woman gets the worst partner she can in any stable matching. 
They suggested an egalitarian measure of optimality under which sum of the 
ranks of partners for all men and women was to be minimized. Irving et al. 
[7] provided an efficient algorithm to find a stable matching satisfying the 
optimality criterion of McVitie and Wilson [14]. This problem was extended for 
the general case of multiple partner stable marriage problem in [2] . 

Further research into different versions of stable marriage problem with 
ties in the preference lists gave rise to the notions of weak stability, super 
stability and strong stability [8] . In the context of the original one-to-one stable 
marriage problem, weak stability means that there is no couple (m, /), each of 
whom strictly prefers the other over her or his current match. It was shown 
in [10] that finding the largest weakly stable matching in an instance of stable 
marriage problem with ties preference lists is NP-Hard. Strong stability is more 
restrictive in the sense that a matching Ai is strongly stable if there is no couple 
(x, y) such that x strictly prefers y to his/her partner in M, and y either strictly 
prefers x to her/his partner in M or is indifferent between them. Finally, a 
matching M is super-stable if there is no couple {x,y), each of whom either 
strictly prefers the other to his/her partner in M. or is indifferent between them. 
Polynomial time algorithms were given for finding strongly stable matchings 
(if one exists) in the context of one-to-one stable marriage problem [12] and 
many-to-one stable marriage problem [9] (Hospital- Residents problem). 

Our work extends the recent results of Irving et al. [9] and Manlove [12] 
to the many-to-many version of the stable marriage problem with ties, where 
each man or woman may have multiple partners and answers some of the 
questions posed in [9]. 

The need for such an algorithm arises in the context of e-marketplaces 
where in shopkeepers and retailers are matched based on their preferences and 
the number of parties they wish to be matched to, which is a function of the 
availability of resources or time. After the matching is done, any party can shop 
from or sell to a restrictive and more prospective set of candidates, leading to 
more chances of an agreement. In such a case, finding a strongly stable match is 
extremely desirable. We seek to investigate this type of stability in the multiple 
partner version of the stable marriage problem with ties (MMSMPT). 

2 Multiple Partner Stable Marriage Model 

Let M = {mi, . . . , m|jvf|} and F = |/i, . ■ ■ , /|f|} respectively denote the sets 
of ]Mj males and \F\ females. Every person has a preference order over those 
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members of the opposite sex that he or she considers acceptable. There may be 
ties indicating indifference in the preference lists of both males and females. Let 
Lm be the preference list of male m with the most preferred set of partner(s) 
occurring ahead of others in the list. Similarly, Lf represents the preference 
ordering of female /. Incomplete lists are allowed so that \Lm\ < \F\, Vm € M 
and \Lf \ < \M\, V/ G F. Each person also has a quota on the total number of 
partners with which he or she may be matched. Furthermore, a person prefers 
to be matched to a person in its preference list than to be matched to fewer 
than the number of persons specified by his or her quota. Let and qf denote 
the respective quotas of male m and female /. The multiple partner stable 
marriage problem can thus be specified by P = (M, F, Lm, Lp, Qm,Qf) where 
Lm and Lp respectively denote the |M| x 1 and |iV| x 1 vectors of male and 
female preference lists, and Qm and Qp represent the \M\ x 1 and \N\ x 1 
vectors of male and female quotas respectively. 

In an instance of MMSMPT, a male-female pair is (m, /) is considered 
feasible if m and / are in each other’s preference lists. That is, m G Lf and 
/ G Lm- A matching j\4 is defined to be a set of feasible male- female pairs 
{(m, /)} such that Vm G M, m appears in at most qm pairs and 'if G F, f 
appears in at most qf pairs. A matching M in the context of multiple partner 
stable marriage problem is said to be stable if any feasible pair (m, f) ^ M 
implies that at least one of the two (m or /) is matched to its full quota of 
partners all of whom he or she considers better. This implies that for any stable 
matching M, there cannot be any unmatched feasible pair (m, f) that can 
be paired with both m and / becoming better off. Let Aim denote the set of 
partners of male m in the stable matching Jit and Mf is defined similarly. For 
a feasible pair (m, f) and a given matching Jit, we define a relation ^ where 
/ <m Jitm means that either \JAm\ < qm, or m prefers / to at least one of its 
matched partners G Aim- Likewise, m ^f Mf means that either |AI/| < <?/, or 
/ prefers m to at least one of its matched partners G Mf. A matching A1 in an 
instance of MMSMPT is weakly stable if there is no pair {m, f) ^ M such that 
m^f Mf and / ^m Mm- Since one-to-one version of stable marriage problem 
with ties is an instance of MMSMPT, where the (?m = 1 and qf = 1, im, f, 
therefore, finding the largest weakly stable matching in MMSMPT is NP-Hard 
as follows from the results in [10]. 

Before defining strong stability and super- stability, we define a relation <l 
for a feasible pair (to, /) where / Om Mm means that either / ^m Aim or to 
is indifferent between / and at least one of the partners in Aim- toO/AI/ is 
defined likewise. 

Definition 1 (Strong Stability). A matching M is strongly -stable if there is 
no feasible pair, {m, f) ^ Al, such that either f -<m Mm and m <\f M f , or 
rrKf Mf and f Om Mm- 

Definition 2 (Super Stability). A matching M is super-stable if there is no 
feasible pair, {m, f) ^ Al, such that m<\f Mf and f Om Mm- 
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3 Finding a Strongly Stable Matching 

In this section, we describe our algorithm for finding a strongly stable matching 
(if one exists) for an instance of MMSMPT, and prove its correctness. A 
male m (or female /) such that |Alm| = Qm is said to be fully-matched in 
the matching At. During the execution of the algorithm, males and females 
become engaged, and it is possible for a male/female to be temporarily matched 
(or engaged) to more partners than his/her specified quota. At any stage, a 
female/male is said to be over-matched or under-matched according as she/he 
is matched to a number of partners greater than, or less than the specified 
quota. We describe a female/male as a marked entity if at any time during the 
execution of the algorithm, she/he has been engaged to more than or equal 
to its specified quota of partners. The algorithm proceeds by deleting pairs 
that cannot be strongly stable from the preference lists. By the deletion of 
a pair (m, /), we mean the removal of / and m from each other’s lists, and, 
if m and / are temporarily matched/engaged to each other, the breaking of 
this engagement. By the head and tail of a preference list at a given point of 
time, we mean the first and last ties respectively on that list (note that a tie 
can be of length 1). We say that a female / (male m) is dominated in a male 
m’s (female f’s) preference list if m (or /) prefers to / (or m) at least qm 
females {qf males) who are engaged to it. A male m who is engaged to a female 
/ is said to be bound to / if / is not over-matched or m is not in f’s tail (or both) . 

An engagement graph G has a vertex for each male and each female with 
(m, /) forming an edge if male m and female / are engaged. A feasible matching 
(set of engagements) in the engagement graph is a matching A1 such that 
if any male m is bound to more than his/her quota of partners, then m is 
matched to qm of these partners in A1 subject to the constraint that A1 has 
maximum possible cardinality. A reduced assignment graph Gr is formed from 
an engagement graph as follows. For each m, for any female / such that m is 
bound to /, we delete the edge (to, /) from the graph and reduce the quota of 
both / and to by one each; furthermore, we remove all other edges incident 
to TO if the quota of to gets reduced to 0. Each isolated vertex or a vertex 
whose quota gets reduced to 0 (corresponding to a male or female) is then 
removed from the graph. For each male to and female / in the remaining graph, 
we denote the revised quota by q'm and q' f respectively. Given a set Z of 
males (females) in Gfi, define the neighborhood of Z, N{Z) to be the set of 
vertices (corresponding to opposite sex) adjacent in Gr to a vertex in Z. The 
deficiency of Z is defined by 6{Z) = ~ Yf^N(z)^' f- and 

Z 2 are maximally deficient, then so is Z\ P| Z 2 which implies that there is a 
unique minimal set with maximum deficiency which is called the critieal set [12]. 



The algorithm 1, (similar to the one described in [9] with some non-trivial 
changes) begins by assigning each male to be free (i.e. not assigned to any female) 
and each female / to be unmarked. Every iterative stage of the algorithm involves 
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2U0orit()m STRONG 

Initialize each person to be available 
Assign each female to be unmarked 

Repeat! 

while some male m is under-matched and has a non-empty list 
for each female / at the head of m’s list 
engage m to f 

if / is fully-matched or over-matched 
set / to be marked 

for each male m dominated on /’s list 
delete the pair (m, /) 
form the reduced assignment graph 
find the critical set Z of males 
for each female / G N{Z) 
for each male m in the tail of /’s list 
delete the pair (m, /) 

} until {Z == 0} 

Let G be the final engagement graph 
Let M he a. feasible matching in G 

if (some marked female is not fully-matched in Af ) or (some unmarked female 
has fewer matches in M than its degree in G) 

No strongly stable matching exists 
else 

Output the strongly stable matching specified by M 



Fig. 1. Algorithm for finding Strongly Stable Matching 

each free male in turn being temporarily engaged to the female(s) at the head 
of his list. If by gaining a new match a female / becomes fully or over-matched, 
then she is said to be marked and each pair such that m is dominated 

in /’s list is deleted. This continues until every male is engaged to his quota of 
females or more females or has an empty list. We find the reduced assignment 
graph Gr and the critical set Z of males. As proved later, no female / in N{Z) 
can be matched to any of the males from those in its tail in any strongly stable 
matching, so all such pairs are deleted. The iterative step is reactivated and this 
entire process continues until Z is empty, which must happen eventually since if 
Z is found to be non-empty, then at least one pair is subsequently deleted from 
the preference lists. Let Ad be any feasible matching in the final engagement 
graph G. Then Ad is a strongly stable matching unless either (a) some marked 
female / is not full in Ad, or (b) some unmarked female / has a number of 
matches in Ad less than its degree in G; in cases (a) and (b) no strongly stable 
matching exists. 

Lemma 1. If the pair {rn, f) is deleted during an exeeution of Algorithm 
STRONG, then it cannot block any matching output by the algorithm. 

Proof. Let Ad be a matching output by Algorithm STRONG comprising of 
feasible pairs that are never deleted, and suppose that some pair (m, /) is 
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deleted during execution of the algorithm. If / is fully-matched in A4, then / 
strictly prefers her current qj partners in M to m. Since the algorithm proceeds 
to find the female worst^ strongly stable matching so (m, /) cannot not a 
blocking pair in this case. 

However, if / is under-matched in Ai. It is clear that, in order for the 
pair (to, /) to be deleted by the algorithm, / must have been marked at some 
point during the execution of the algorithm; otherwise to along with the set 
of partners of / in would have been bound to / and would not have been 
deleted from /’s list. Moreover if / is marked, then the algorithm would not 
have given any strongly stable marriage; hence the claim follows. □ 



Lemma 2. Every male who is fully-engaged or over-engaged in the final en- 
gagement graph G must be fully-matched in any feasible matching A4. 

Proof. The result is true by definition for any male bound to at least Qm females. 
Consider the other males engaged in G. Any set S of them must be collectively 
adjacent in Gr to set of females with at least X^mes cumulative quota, 
otherwise one of them is in the critical set Z, and hence Z 7 ^ 0. As a consequence 
of Hall’s Theorem, this means that all these males are fully-matched in any 
maximum cardinality feasible matching in Gr, and hence they must be fully 
matched in any feasible matching Ai. □ 



Lemma 3. In the final assignment graph G, if a male is bound to more than Pm 
females, then the algorithm will report that no strongly stable matching exists. 

Proof. Suppose the algorithm reports that matching A4 is strongly stable. De- 
note by Fi the set of over-matched females in G, let F 2 = F\Fi and let da{f) 
denote the degree of the vertex / in G, i.e., the number of males temporarily 
engaged to /. Denote by Mi the set of males bound to no more than or equal to 
their specified quota of females in G, and M 2 = M \ Mi. However, as a conse- 
quence of Lemma 2, for males to € M 2 who are over-engaged or fully-engaged, 
q'ra = Qm- We have 

\M\= dG{m)+ Y 9 'm (1) 

m£Mi m^M2 

Furthermore, 

\M\=Y^f+T.^G{f). (2) 

feFi feF2 

If we consider the procedure used for obtaining the reduced assignment graph, 
then some male who is over-bound can be matched to only qm females; and it 
follows that 

Y ( 9 / “ ^ g (/) > Y (3) 

/GFi /GF2 mGMi 

^ There can be no strongly stable matching wherein any female has better partners 
than those in the matching found by the algorithm 
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Combining (1), (2) and (3) gives 

'?'/ < H 9m > (4) 

/GFi meM2 

which contradicts the fact that the critical set in Gu is empty, and hence the 
claim follows. □ 

Lemma 4. Algorithm STRONG outputs a strongly stable matching (if it gives 
a matching as output). 

Proof. Suppose Algorithm STRONG outputs a matching M, and that {m, f) 
is a blocking pair for A4. Then m and / are acceptable to each other, so that 
each is on the original preference list of the other. As a consequence of Lemma 1, 
the pair (m, /) has not been deleted which implies that one of m or / is on the 
reduced list of the other. Since, (to, /) is a blocking pair for A4, and / is in to’s 
reduced preference list, it cannot follow that / -<m Mm- Hence, / must be tied 
with the tail of Mm , and therefore, to -<f Mf. Therefore, to is bound to / in 
the final engagement graph G. 

The fact that to is bound to / in G, but not matched with / implies 
that TO is bound to at least Pm partners other than / in G, by the condition for 
A4 to be a feasible matching. Hence to is bound to more than pm females in G, 
and Lemma 3 contradicts the assumption that the algorithm will output any 
matching M. □ 

Lemma 5. No strongly stable pair is ever deleted durinq an execution of Alqo- 
rithm STRONG. 

Proof. Suppose that the pair (to, /) is the first strongly stable pair deleted 
during some execution of the algorithm, and let M he a, strongly stable 
matching in which to is matched to / amongst its other partners, if any. There 
are two cases to consider. 

On one hand, suppose (to, /) is deleted because of to being dominated in 
f’s list. Call the set of males in the tail of Mf (at this point), M. None of 
the males in M can be assigned to any other female they are not matched 
with and they prefer to / in any strongly stable matching, for otherwise some 
strongly stable pair must have been deleted before (to, /) which contradicts our 
assumption. Moreover, none of the males in M can be bound to / as in that 
case {m, f) would not be a strongly stable pair. Therefore, in M, at least one 
of the males in M, to" say, cannot be assigned to /, so that / <\m" Mm"- It 
follows that {m” , f) blocks M, a contradiction. 

On the other hand, suppose (to, /) is deleted because of / being matched 
to some other male in the critical set Z at some point, and to is in f’s tail. 
We refer to the set of lists at that point as the current lists. It is important to 
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note that after the reduced assignment graph is constructed, the modified lists 
of females contain set of equally favorable males (tail of the original list). Let 
Z' be the set of males in the critical set Z who are fully assigned in feasible 
matching A4 and are matched to q'^ females in their current list, and let Y' be 
the set of females in N{Z) who are assigned in A4 to at least one male from 
the tail of their current list. Then f G Y' since (m, /) gets deleted implying 
that / must be matched to some male in the tail of her current list, so Y' ^ 0. 
Furthermore, any male m! in Z who is engaged to / must be in Z' , for otherwise 
{m’, f) would block A4, which implies Z' ^ We now claim that N{Z\ Z') is 
not contained in N{Z) \ Y' . For, suppose that the containment does hold. 

Then, 

SmeZ\Z' 9m — J2feN(Z\Z') 9/ ^ J2meZ\Z' 9m — J2feN{Z)\Y' 

— Yl,m^Z ^rn ~ Yl,f^N(Z) ‘If ~ (SmeZ' 9m ~ J2feY' ‘If) ' 

But x; m£Z' 9m ~ J2f£Y' ‘If — because every male in Z' is matched in A4 with 
a female in Y' . Hence Z\ Z' has deficiency greater than or equal to that of Z, 
contradicting the fact that Z is the critical set. Thus the claim is established. 

Hence there must be a male mi € Z\Z' and a female fi € Y' such that m\ 
is engaged to f\ at some point but they are not matched in the final matching. 
Since /i -^mi and mi <1/^ Af /^ , therefore (mi, fi) blocks Af, and we reach 
a contradiction. □ 

Lemma 6. Let Ai he a feasible matehing in the final engagement graph G. If 
(a) some unmarked female f has fewer matehes in M than the number of her 
engagements in G, or (b) some marked female f is not full in M, then no 
strongly stable matching exists. 

Proof. Suppose that condition (a) or (b) is satisfied, and that M' is a strongly 
stable matching for the instance. Every male fully-engaged (or over-engaged) in 
the final engagement graph G must be fully matched in AJ (using Lemma 2), 
and any male engaged to a fewer number in G than his specified quota, must be 
matched to less than or equal to the number of his partners in G. It follows that 
lAl'l < \M.\ where |AI| is defined to be the sum of the matches of males in M.. 

If (a) or (b) is true, |A1/| < mm{dc{f),qf) for some female /, where da{f) 
is the degree of / in G, i.e., the number of males engaged to / in G. Hence 
also for some /', |A1'/'| < min(dG(/0j 9/')- So there is a female /' and a male 
m' ^ Ai' f such that (i) f is not full in Ai' and (ii) m' was engaged to f in 
G. Hence {m' , f) is a blocking pair for Ai', contradicting that it is a strongly 
stable matching. □ 

Combining these lemmas leads to the following theorem. 

Theorem 1. For a given instance of MMSMPT, Algorithm STRONG deter- 
mines whether or not a strongly stable matching exists. If such a matching does 
exist, all possible executions of the algorithm find one in which every assigned 
male is matched to as favorable set of females as in any strongly stable matching. 
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4 Structure of Strongly Stable Marriages 

In this section, we closely study the structure of strongly stable matchings in the 
context of multiple partner stable marriage problem with ties and interestingly, 
obtain results that are counterparts to the results [13] for one-to-one verion of 
the problem. 

Lemma 7. For a given MMSMPT instance, let M be the matching obtained by 
algorithm STRONG and let M' be any strongly stable matching. If any female 
f is under-matched in M' then every male matched to f in M is also assigned 
to f in MI . 

Proof. Suppose any male m that is matched to / in M, but not in M'. Then 
(m, /) blocks M' since / is under-matched in M' and / cannot be dominated 
by the set of partners of m (Theorem 1). □ 

Lemma 7 and the fact that algorithm STRONG matches every male to the most 
favorable set of females he can get matched to in any strongly stable matching, 
lead to the following theorem. 

Theorem 2. For a given MMSMPT instance, every male (female) is matched 
to the same number of females (males) in every strongly stable matching. Fur- 
thermore, any person who is under-matched in some strongly stable matching is 
matched to the same set of matches in every strongly stable matching. 

Moreover, it can be shown that the set of strongly stable marriages for an in- 
stance of MMSMPT form a finite distributive lattice. Before proceeding we state 
some definitions for the sake of clarity of exposition. 

Definition 3. Let DJI be the set of strongly stable matchings for a given instance 
of MMSMPT. We define an equivalence relation ^ on DJI as follows. For any 
two strongly stable matchings Mi, M 2 S DJI, Mi ~ M 2 if and only if each male 
is indifferent between Mi and M 2 - Denote by £ the set of equivalence classes of 
DJI under ^ and denote by [M], the equivalence class containing M € DJt. 
Definition 4. Let Mi and M 2 be two strongly stable matchings. We define a 
relation =4 wherein Mi =4 M 2 if and only if each male either strictly prefers 
Ml to M 2 or is indifferent between them. 

Definition 5. Let £ be any equivalence class as defined in Definition 3 for a 
given instance of MMSMPT. Define a partial order <i on C as follows: for any 
two equivalence classes [Afi], [M 2 ] G 61, [Mi] < [M 2 ] if and only if Mi =4 M 2 - 

For any two strongly stable matchings M and M' for a given instance of 
MMSMPT, let In{M,M') denote the set of men m such that Mm ~ 
i.e, m is indifferent between M and M' - 

Lemma 8. Let M and M' be two strongly stable matchings in a given instance 
of MMSMPT. Let M* be a set of (male, female) pairs defined as follows: for 
each male m G In{M,M'), m has in M* the same partner as in M, and for 
each male m In{M,M'), m has in M* the better set of his partners in M 
and M'. Then, M* is a strongly stable matching. 
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Proof. Let us assume that (m, /) is a blocking pair for M*. Without loss of 
generality we assume that m prefers / to at least one of its set of partners, 
in Ai*. Then / either prefers m to at least one of the males in Af j or is 
indifferent between m and the least favourable partner in Furthermore, 

m prefers / to at least one partner in both Aim and Ai'm- In either case, 
{m, f) either blocks Ai or At' depending on whether Aij is same as A1 / or 
Aij. Thus, we reach a contradiction as Ai and Ai' are given to be strongly stable. 

Otherwise m could indifferent between / and the least preferred partner 
in Aim- Thus / must prefer m to at least one partner in Ai*^ for (m, /) to be a 
blocking pair for Ai*. By the construction of AI*, m prefers / to at least one 
partner in Aim or is indifferent between / and the least preferred partner in 
Aim, and m prefers / to at least one partner in Ai'm or is indifferent between 
/ and the least preferred partner in Ai'.^, Once again we reach a contradcition 
as (to, /) either blocks Ai or Ai' depending on whether Aij is same as Aly or 
Ai'j,. Therefore, Ai* is strongly stable. □ 

Proceeding in a similar way, we get the following lemma. 

Lemma 9. Let Ai and Ai' be two strongly stable matehings in a given instance 
of MMSMPT. Let Ai* be a set of (male, female) pairs defined as follows: for 
each male m € Ln{Ai,Ai'), m has in Ai* the same partner as in Ai, and for 
each male to ^ Ln{Ai, Ai') , to has in AI* the worse set of his partners in Ai 
and Ai' . Then, Ai* is a strongly stable matching. 

We define operations /\ and V operations as following. AI /\Ai' denotes the set 
of (male, female) pairs in which each male to G Ln{Ai, Ai') is matched to the 
same set of partners as in AI, and each male to ^ Ln{Ai,Ai') gets matched to 
the better set of his partners in Ai and Ai' . Similarly, Ai\J Ai' denotes the set 
of (male, female) pairs in which each male to G Ln{Ai, Ai') is matched to the 
same set of partners as in Ai, and each male to ^ Ln(Ai,Ai') gets matched 
to the worse set of his partners in Ai and Ai'. Using Lemma 8 and Lemma 9, 
Ai /\ Ai' and Ai \J Ai' are strongly stable matchings for the given MMSMPT 
instance. 

It is important to note that Ai/\A4' ^ Ai' /\Ai because of males to G 
Ln{Ai,Ai'). Similar relation holds true for \f operation. Lemma 8 and Lemma 9 
lead us to the definition of meet and join operations for the equivalence classes. 
The given theorem follows using similar arguments as in [13]. 

Theorem 3. Given an instance I of MMSMPT, and let Hfl be the set of strongly 
stable matchings in I . Let Cl be the set of equivalence classes of under ^ and 
let < be the partial order on £ as defined in Definition 5. Then, (£, <) forms a 
finite distributive lattice. 

The consequence of this theorem is that it is possible to efficiently enumerate 
all the stable matchings in the multiple partner stable marriage case (a result 
known for one-to-one stable marriage version of the problem). 
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5 Concluding Remarks 

The many-to-many version of the stable marriage has wide applications in ad- 
vanced matchmaking in e-marketplaces and finding a strongly stable matching in 
such a scenario makes best sense. In this text, we present an efficient algorithm 
for finding a strongly stable marriage (if one exists) for an instance of multiple 
partner stable marriage problem with ties. Furthermore, we showed that the 
set of strongly stable matchings forms a distributive lattice. By doing so, we 
generalized some of the results already known for the corresponding one-to-one 
and many-to-one (hospital-residents problem) versions of the problem. 

Very recently, we have been referred to some recent research work [16] on 
stable matchings wherein the concept of level-maximal matchings is used to 
bring down the running time of the algorithms for finding strongly stable 
matchings in the one-to-one and many-to-one versions of the stable marriage 
problems. The structure of many-to-many stable matchings very closely resem- 
bles their one-to-one and many-to-one counterparts. As a result, we believe the 
algorithm in [16] can be extended to the many-to-many stable marriages with 
multiple partners. We consider the formalism of such an algorithm as one of the 
directions for our future work. 
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Abstract. In classical network flow theory, flow being sent from a source 
to a destination may be split into a large number of chunks traveling on 
different paths through the network. This effect is undesired or even for- 
bidden in many applications. Kleinberg introduced the unsplittable flow 
problem where all flow traveling from a source to a destination must be 
sent on only one path. This is a generalization of the NP-complete edge- 
disjoint paths problem. In particular, the randomized rounding technique 
of Raghavan and Thompson can be applied. A generalization of unsplit- 
table flows are k-splittable flows where the number of paths used by a 
commodity i is bounded by a given integer hi. 

The contribution of this paper is twofold. First, for the unsplittable flow 
problem, we prove a lower bound of 17(log m/ log log m) on the perfor- 
mance of randomized rounding. This result almost matches the best 
known upper bound of O(logm). To the best of our knowledge, the prob- 
lem of finding a non-trivial lower bound has so far been open. 

In the second part of the paper, we study a new variant of the k-splittable 
flow problem with additional constraints on the amount of flow being sent 
along each path. The motivation for these constraints comes from the 
following packing and routing problem: A commodity must be shipped 
using a given number of containers of given sizes. First, one has to make 
a decision on the fraction of the commodity packed into each container. 
Then, the containers must be routed through a network whose edges 
correspond, for example, to ships or trains. Each edge has a capacity 
bounding the total size or weight of containers which are being routed 
on it. We present approximation results for two versions of this problem 
with multiple commodities and the objective to minimize the congestion 
of the network. The key idea is to reduce the problem under consideration 
to an unsplittable flow problem while only losing a constant factor in the 
performance ratio. 
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1 Introduction 

Problem definition. The unsplittable flow problem (UFP) has been introduced 
by Kleinberg [11] : Given a network with capacities on the arcs and several source- 
sink pairs (commodities) with associated demand values, route the demand of 
each commodity on exactly one path leading from its source to its sink without 
violating arc capacities. For the special case of unit capacities and unit demands, 
we get the edge-disjoint path problem which is well-known to be NP-complete [8] . 
Kleinberg [11] introduced the following optimization versions of the unsplittable 
flow problem. Minimum congestion: Find the smallest value a > 1 such that 
there exists an unsplittable flow that violates the capacity of any edge at most 
by a factor a. Minimum number of rounds: Partition the set of commodities into 
a minimum number of subsets (rounds) and And a feasible unsplittable flow for 
each subset. Maximum routable demand: Find a feasible unsplittable flow for a 
subset of demands maximizing the sum of demands in the subset. Here, we are 
mainly interested in the minimum congestion problem. 

A natural generalization of the unsplittable flow problem is the k-splittable 
flow problem (k-SFP) which has recently been introduced by Baier, Kohler, and 
Skutella [4]. For each commodity, there is an upper bound on the number of 
paths that may be used to route its demand. Already for the single-commodity 
case, the k-SFP is NP-complete. Of course, the optimization versions of the UFP 
discussed above naturally generalize to the k-SFP. 

In this paper we introduce a new variant of the k-splittable flow problem 
with additional upper bounds on the amount of flow being sent along each path. 
As already discussed in the abstract, this generalization is motivated by trans- 
portation problems where divisible goods have to be shipped through a network 
using containers of given sizes. Each container being used, must be routed along 
some path through the network. In particular, the size of the container induces 
an upper bound on the amount of flow being sent along the chosen path. It is 
therefore called path capacity. 

An edge of the network corresponds, for example, to a train or ship which 
can be used in order to transport containers from the tail to the head node of 
this edge. Of course, such an edge cannot carry an arbitrary amount of flow. We 
consider two variants of the problem for which we provide an intuitive motiva- 
tion: 

i) When loading a ship, the total weight of all containers on the ship must not 
exceed the capacity of the ship. The weight of a container is determined 
by the actual amount of flow assigned to it. Thus, in the first variant with 
weight capacities, the capacity of an edge bounds the actual amount of flow 
being sent on paths containing this edge. This is the classical interpretation 
of edge capacities. 

ii) When loading a train, the total size of the containers is usually more impor- 
tant than their total weight. Therefore, in the model with size capacities, 
the capacity of an edge bounds the total path capacity of all paths being 
routed through that edge. Notice that these capacity constraints are more 
restrictive than the classical ones in the first model. 
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We are mainly interested in the corresponding NP-hard optimization problem to 
minimize the congestion. A precise and formal definition of the problem under 
consideration is given in Section 2. 

Related results from the literature. The imsplittable flow problem has been well 
studied in the literature. Raghavan and Thompson [15,14] introduced a random- 
ized rounding technique which gives an O (log m) -approximation algorithm for 
the problem of minimizing the congestion for the UFP, if the maximum demand 
is bounded from above by the minimum edge capacity (the balance condition)^ . 
For any instance of the problem of minimizing the congestion for the UFP their 
technique searches for an optimal solution of the related (fractional) multicom- 
modity flow problem first and then chooses exactly one of the flow paths from it 
for each commodity at random, such that each flow path is chosen with probabil- 
ity equal to its value divided by the demand of its commodity. Their procedure 
even yields a constant factor approximation, if either the minimum edge capacity 
or the congestion of an optimal fractional routing is at least 17 (log m). 

For the objective of maximizing the sum of satisfied demands while meet- 
ing all edge capacity constraints, Baveja and Srinivasan [5] gave an approxima- 
tion ratio of 0(^/m). Azar and Regev [1] gave a strongly polynomial algorithm 
also with approximation ratio 0{y/rn). Kolman and Scheideler [13] even gave 
a strongly polynomial 0(i/m)-approximation algorithm for the problem with- 
out the balance condition. On the other hand, Guruswami, Khanna, Rajaraman, 
Shepherd, and Yannakakis [9] showed that in the directed case there is no approx- 
imation algorithm with performance ratio better than 0(m“!+'^) for any e > 0, 
unless P=NP. To get better approximation results one has to incorporate ad- 
ditional graph parameters into the bound. Baveja and Srinivasan [5] developed 
a rounding technique to convert an arbitrary solution to an LP-relaxation into 
an unsplittable flow within a factor of 0{^/m) or 0{d) where d denotes the 
length of a longest path in the LP solution. Using a result of Kleinberg and 
Rubinfeld [10], showing that d = 0(A‘^a~^ log^ n) for uniform capacity graphs 
with some expansion parameter a and maximal degree A, one can achieve a 
better bound. Kolman and Scheideler [13] use a new graph parameter, the “flow 
number” F and improve the ratio further to 0{F) for undirected graphs; they 
show that F = C7(Z\q!“^ logn). Kolman and Scheideler also showed that there 
is no a better approximation in general, unless P = NP. Recently Chekuri and 
Khanna [6] studied the uniform capacity UFP for the case that the task is to 
satisfy as many requests as possible. Using results from [1] or [5] they found 
an 0(n^'^^)-approximation algorithm for the problem in undirected graphs, an 
0(n'^'^®)-approximation algorithm for the problem in directed graphs, and an 
0(-ynlog n)-approximation algorithm for the problem in acyclic graphs. If all 
commodities share a common source vertex, the unsplittable flow problem gets 
considerably easier. Results for the single source unsplittable flow problem have, 
for example, been obtained in [7,11,12,16]. 

^ Unless stated otherwise, the balance condition is always assnmed to be met for the 
UFP. 
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As mentioned above the k-SFP was introduced by Baier, Kohler, and Skutella 
[4] . They first consider a special form of the k-SFP in which the task is to find a 
feasible k-split routing that uses exactly ki paths for commodity i G {1, . . . , K} 
which all carry the same amount of flow. They call this problem the uniform 
exactly -k-SFP. Since this problem still contains the UFP as a special case, it is 
also NP-hard. But note that any instance of it can be solved by solving a special 
instance of the UFP on the same graph. Thus, it holds that any p-approximation 
algorithm for the problem of minimizing the congestion for the UFP provides 
a p-approximation algorithm for the problem of minimizing the congestion for 
this special k-SFP. To approximate an instance of the general k-SFP Baier et al. 
use the fact that a uniform exactly-k-split routing of minimal congestion for any 
instance of the k-SFP is a k-split routing which approximates the congestion of 
an optimal k-split routing for the same instance within a factor 2. 

Bagchi, Chaudhary, Scheideler, and Kolman [3] introduced a problem which 
is quite similar to the k-SFP. They consider fault tolerant routings in networks. 
To ensure connection for each commodity for up to /c — 1 edge failures in the net- 
work, they require fc € N edge disjoint flow paths per commodity. This problem is 
called the k-disjoint flow problem (k-DFP). Bagchi et al. [3] also consider the in- 
tegral splittable flow problem (ISF). Bagchi [2] even extended the considerations 
of [3] to the k-SFP. The aim of Bagchi et al. is always to maximize the sum of 
satisfied demands subject to meeting all edge capacity constraints. In the k-DFP 
a demand d for any request is to be satisfied by a flow on k disjoint paths, each 
path carrying d/k units of flow. In the ISF integral demands need to be satisfied 
by an arbitrary number of paths, where the flow value of any path has to be 
integral. In contrast to [4] and the present paper, Bagchi does not admit different 
bounds on the numbers of paths for different commodities in the k-SFP. For all 
of the mentioned problems, Bagchi et al. introduce simple greedy algorithms in 
the style of greedy algorithms for the UFP given by Kolman and Scheideler [13]. 
With these algorithms they obtain approximation ratios of 0(k^Flog{kF)) for 
the k-DFP and 0{k^F) for the k-SFP on the conditions, that they have unit 
capacities and the maximum demand is at most k times larger than the mini- 
mum edge capacity. Here F is again the flow number of the considered instance 
of the according problem. For the ISF Bagchi et al. obtain an approximation 
ratio of 0(F) for any instance with uniform capacities in which the maximum 
demand is at most the capacity of the edges. The ISF has earlier been studied by 
Guruswami et al. [9] who obtained an approximation ratio of 0(\/mA log^ m) 
for it, where A is the maximum degree of a vertex in the considered graph. 

Contribution of this paper. Randomized rounding has proved to be a very suc- 
cessful tool in the design of approximation algorithms for many NP-hard discrete 
optimization problems during the last decade. The general idea is as follows: 
Formulate the problem as a 0/ 1-integer linear program, solve the linear pro- 
gramming relaxation, and randomly turn the resulting fractional solution into 
an integral solution by interpreting the fractional values as probabilities. This 
idea was originally introduced in 1987 by Raghavan and Thompson [15] for the 
edge-disjoint paths problem which is a special case of the unsplittable flow prob- 
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lem. Although our general understanding of approximability has considerably 
improved since then, the 0(logm)-approximation for the minimum congestion 
version of the UFP is still best known. Even worse, it is neither known that 
no constant factor approximation exists (unless P=NP) nor has any non-trivial 
lower bound on the performance of randomized rounding been achieved. In Sec- 
tion 3 we prove that the performance ratio of randomized rounding for the UFP 
is f?(logm/loglogm). 

In Section 2 we define the k-splittable flow problem with path capacities. For 
both variants of the problem discussed above, we prove that they have essentially 
the same approximability as the unsplittable flow problem. To be more precise, 
we show that any p-approximation algorithm for the UFP can be turned into 
a 2p-approximation for either variant of our problem. The underlying idea is to 
decompose the solution of the problem into two parts. First, a packing of flow 
into containers is obtained. Then, each container is routed along a path through 
the network. The latter step can be formulated as an unsplittable flow problem. 
In Section 4 and Section 5, respectively, we present simple packing routines for 
both variants of the problem. We show that we do not lose more than a factor 2 
with respect to congestion. Due to space limitations, we omit some details in 
this extended abstract. 

2 Problem Definition and Notation 

An instance of the k-SFP with path capacities consists of a digraph G = (V, E) 
with edge capacities c : E ^ K+ and a. set T C U x U of AT S N requests or 
commodities. The fth request is denoted by (si, U). Together with request i we are 
given its non-negative demand di and the number of paths ki (containers) it can 
be routed on. The flow value on the jth path of commodity i must be bounded 
by the given path eapacity t* >0. We always assume that the path capacities of 

every commodity suffice to route the entire demand, that is, ^ for 

all i = 1, . . . , AT. 

A feasible solution to the fc-SFP with path capacities consists of Si-t^-paths 
PI, . . . , with corresponding nonnegative flow values . for each com- 

modity i, such that the following requirements are met: 




for all* € {1, ... , K} 



( 1 ) 



that is, the whole demand of request i is routed. Moreover 




for alH € {1, . . . , K} andj G {1, . . . , ki}, (2) 



that is, the path capacities are obeyed. Finally, 



K 




i—1 ,ki‘. 






for all e G E, 



(3) 
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K 

E E tj < c(e) for all e £ E, (4) 

i—1 

fi>0 A eGP* 



where (3) must hold if we consider the problem with weight capacities, and (4) 
must hold if we consider the problem with size capacities. That is, the according 
inequalities ensure that no edge capacity constraint is violated. We do not require 
that the paths P^, . . . , P^. are distinct, for i G K}. Furthermore, we allow 

a path to have flow value 0. In other words, we may use less than ki paths for 
commodity i. 

In the following we use the notion routing for a pair {TZ, {fp) pen), where TZ is 
a set of paths in the considered graph and fp is the flow value on path P £TZ. We 
will use the notion k-split routing for a routing of P that routes the whole demand 
of each commodity and uses at most ki paths for commodity i. A feasible k-split 
routing shall be one which additionally satisfies all edge capacity constraints. 

Note, that the k-SFP with path capacities is NP-hard, because it contains 
the unsplittable flow problem as a special case (set ki = 1 and t\ = di, for 
all i = 1, . . . ,K). Even the single commodity version is NP-hard since it is a 
generalization of the NP-hard single commodity k-SFP [4]. Therefore we are 
interested in approximate solutions to the problem. The congestion of a routing 
is the maximum factor by which the capacity of an edge in the graph is violated. 
Consider any k-split routing. For either variant of the k-SFP, we can simply define 
the congestion of this routing as the smallest value a > 1 for which multiplying 
the right hand side of (3) (or (4), respectively) makes this inequality true. 

3 A Lower Bound for Randomized Rounding 

In this section we consider the unsplittable flow problem, i.e., the special case 
ki = 1 and t\ = for alH = 1, ... ,K. We present a family of instances for which 
the congestion of an unsplit routing found by randomized rounding is always a 
factor l7(log m/ log log m) away from the congestion of an optimum solution. 
The performance of randomized rounding depends on the particular choice of 
an optimum fractional solution. We assume that an optimum fractional solution 
with a minimum number of paths is used. 

Theorem 1. The performance ratio of randomized rounding for the minimum 
congestion version of the unsplittable flow problem is l7(logm/loglogm). The 
result even holds if we start with an optimum fractional solution with a minimum 
number of paths. 

Proof. Consider the instance of the UFP depicted in Figure 1. Obviously, there 
exists an unsplit routing with congestion 1 where the flow value on any edge 
is equal to its capacity. Of course, the latter property must also hold for any 
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K -I K-1 




K-1 K-1 



Fig. 1. An instance of the unsplittable flow problem consisting of K commodities 
with unit demands, sharing a common source s and a common sink t. There are 
consecutive pairs of parallel edges. The numbers at the edges indicate capacities. 



optimum fractional solution. Consider the following fractional path decomposi- 
tion: Each commodity i = 1,. . . ,K is distributed among K paths P^, . . . , 
with flow values equal to 1/K. The upper edges of the graph are labeled by 
A'-tuples (ji, . . . , Jk) K}^ such that every edge gets a different label. 

For i,j = 1, ... ,K, path P^ uses an upper edge labeled (ji, • ■ ■ , jic) if and only 
if ii = j ■ Til® underlying intuition of this construction is that, for each subset 
of paths containing exactly one path of each commodity, there is an upper edge 
that is exactly used by the paths in the considered subset. 

We argue that, for the described path decomposition, randomized round- 
ing always yields an unsplit routing of congestion K\ By construction of the 
fractional path decomposition, for every possible random choice, there exists an 
upper edge of unit capacity which is used by all K commodities. In particular, 
the flow on this edge exceeds its capacity by a factor K. Since m = 2K^ , we 
get K = f2(logm/loglogm). 

Unfortunately, the fractional path decomposition used above is somehow ar- 
tificial. By perturbing arc capacities and demand values one can ensure that our 
path decomposition is minimal and still yields the same lower bound. Further 
details are omitted due to space limitations. □ 

4 The k-Splittable Flow Problem with Weight Capacities 

In this section we consider the first variant of the k-SFP with path capacities, 
i.e., the case of weight capacities. We present a simple packing strategy which 
allows us to reduce the k-SFP with path capacities to the UFP while losing at 
most a factor 2 with respect to optimum congestion. We decompose the demand 
di of commodity i into ki pieces load \, . . . , load\. and then replace request i by 
ki copies of i, whose demands are set to the values of the demand pieces. To 
make the procedure more vivid, one may always think of packing ki containers 
for each commodity i which are then routed along certain paths through the 
network. 

After defining the load for each container, we consider an optimum solution 
to the k-SFP and interpret its flow values as capacities of bins. This interpre- 
tation comes from the idea that we can assign our loads to these resulting bins 
for the purpose of sending each load (container) along the same path as the 
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containers 



bins 




loads 



Fig. 2. An example for the loading of the containers. The beams represent the con- 
tainers, the striped part the loading defined in (5), and the gray hlled part the loads 
of an optimal solution. 

corresponding bin is taking in the optimum solution. We show that one can find 
an assignment of loads to bins such that the bins are overloaded by a factor 
of at most 2. Consequently, there exists a solution which uses our packing of 
containers and has a congestion of at most twice the minimum congestion. 

For the purpose of assigning the loads to the bins with a small overload factor, 
it appears to be reasonable to break the demand of any commodity into pieces 
which are as small as possible. In other words, we want to load the containers such 
that the maximum load is minimized. One can visualize this by picturing that all 
containers of the same commodity are lined up in a row and merged with each 
other and get filled up with water; see Figure 2. To simplify notation, we assume 
without loss of generality that t\ <t\ < • • • < for alH = 1, . . . , iF. Then, we 
can assume that the loads load^ of commodity i are indexed in nondecreasing 
order, too. A precise description of the loading strategy depicted in Figure 2 is 
as follows: 



For any i G iL}, we can simply start with j = 1 and then calculate the 

loads in increasing order. Note that two containers of same size get the same 
load. Thus, it even suffices to calculate the load for only one container per size. 

Now consider any optimum solution F to the underlying problem of mini- 
mizing the congestion for the k-SFP with path capacities. Denote its flow values 
by bin \, . . . , for all f = 1, . . . , AT. If we arrange the bins in nondecreasing 
order, for all z = 1, . . . , AT, then we can prove the following lemma. 
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Lemma 1. For all i € {Ij--- ,K} the following holds: If for some j € 
{1, . . . , ki} it holds that loadlj > biuj, then loadj, > hint,, for all j' < j. 

Proof. Since F is an optimal and thus feasible solution, no bin is larger than 
its path capacity. Hence, a violation of the claim would contradict our loading 
strategy; see also Figure 2. □ 

Next we prove that we can pack ki items of size load \, . . . , load]., into ki bins 
of size bin \, . . . , bin]., without exceeding the bin sizes by more than a factor 2. 
This implies that we can simultaneously route all demands di in the instance 
of the UFP mentioned above, in spite of not allowing a congestion greater than 
twice the congestion of the optimum solution F. (Remember, that the instance 
of the UFP resulted from replacing each request in T by exactly ki copies and 
giving the copies the demands load], ... ,load]...) The proof of the following 
lemma is omitted due to space limitations. 

Lemma 2. Let ai < 02 < • • • < a„ and bi < b 2 <■■■< bn be nonnegative 
real numbers with = Sr=i satisfying the following: If aj > bj 

for some I G n}, then Oi > bi for all i = Then there exists 

an assignment f : {1, . . . , n} — >■ {1, . . . , n} such that for all 

i = 1, . . . , n. 

Note that for each i G {I,-.. ,K} the loads and the bins satisfy the re- 
quirements from Lemma 2. Thus, we can assign all loads to the bins without 
exceeding the bin capacities by a factor greater than 2. Now consider the fol- 
lowing instance of the UFP: Take the underlying k-SFP with path capacities. 
For alH G { 1 , . . . , 7F} , replace request i by ki copies of i with demands equal to 
load ], . . . , load]... Using the results from above, we obtain the following: 

Theorem 2. For the problem of minimizing congestion, a p-approximate solu- 
tion to the instance of the UFP described above yields a 2p-approximate solution 
to the underlying instance of the k-SFP with path capacities (weight capacities). 

We conclude this section with some remarks on the running time of the re- 
sulting 2p-approximation algorithms for the k-SFP with path capacities. Notice 
that the size of the instance of the UFP described above is not necessarily poly- 
nomial in the input size. The number of copies ki of request i can be exponential, 
if we are only given the different sizes of containers and the number of contain- 
ers for each size in the input of the k-SFP with path capacities. As already 
mentioned above, it will not be a problem to compute the loads in polynomial 
time. However, in order to achieve polynomial running time when solving the 
resulting instance of the UFP, we must use a polynomial time algorithm for the 
UFP which is able to handle a compact input format where, for each request, 
the number of copies of the request is given. In fact, most approximation algo- 
rithms for the UFP fulfill this requirement or can easily be modified to fulfill it. 
As an example, we mention the randomized rounding procedure: In a fractional 



Flows on Few Paths: Algorithms and Lower Bounds 529 



solution, all copies of a request can be treated as one commodity for which a 
path decomposition using at most m paths exists. Randomized rounding then 
assigns to each of these paths a certain number of copies of the corresponding 
request. This procedure can easily be implemented to run in polynomial time. 



5 The k-Splittable Flow Problem with Size Capacities 

In this section we consider the second variant of the k-SFP with path capacities, 
i.e., the case of size capacities. If all path capacities are equal to some uniform 
value Ti, for all commodities i, it is easy to observe that there always exists an 
optimum solution using the minimum number of paths \di/Ti\ for commodity i. 
Therefore, this problem can be formulated as a ‘uniform exactly-k-SFP’ (see [4] 
for details). In particular, all results for the latter problem obtained in [4] also 
hold for the k-SFP with uniform path capacities. 

We turn to the general problem. As in the last section, we assume that 
t\ < ■ ■ ■ < for all J = 1, . . . ,K. Our general approach is identical to the 
one presented in the last section. Again, we first give a simple strategy for the 
packing of containers. This reduces the problem to an UFP. We prove that at 
most a factor 2 in the performance is lost due to this packing by comparing it 
to the packing of an optimum solution (‘bins’). 

In contrast to the last section, it seems to be reasonable to load each container 
that is used up to its size. Notice that only the size of a container (and not its 
actual load) matters for the routing. Among all containers, we try to use the 
smallest ones in order to be able to obtain reasonable assignments to bins later. 
The load loadj of the jth container of commodity i is determined as follows: 
Consider the containers of each commodity in order of decreasing size. If the 
whole (remaining) demand of the commodity fits into the remaining containers 
with smaller index, then discard the container. Otherwise, the container is filled 
up to its size (unless the remaining demand is smaller than the size) and the 
remaining demand is decreased by this value. 

The above strategy finds the loads in polynomial time if the path capacity of 
every path is explicitly given in the input. If the encoding of the input is more 
compact and, for each commodity i, only specifies the different path capacities 
together with the number of paths for each capacity, the algorithm can easily be 
adapted to still run in polynomial time. 

In the following we consider a fixed optimum solution to the k-SFP with 
path capacities. For all i G {1, . . . , K}, let Ot C {!,..., ki} such that j G Oi 
if and only if the jth container is used by the optimum solution. Similarly, let 
Bi C {1, . . . , fcj} such that j G Bi if and only if our packing uses the jth container, 
i.e., loadj > 0. The following lemma says that the containers used by our loading 
can be packed into the containers used by the optimum solution such that the 
size of any container of the optimum solution is exceeded by at most a factor 2. 
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Lemma 3. For all i = 1, . . . , K , there exists an assignment f : Bi ^ Oi such 
that 

fj, < for all j G Oi- (6) 

Proof. In the following we keep i fixed; to simplify notation, we omit all indices 
and superscripts i. By construction of our packing strategy, 

E < E tj for all jo G B. (7) 

In particular, it follows from (7) that the maximum index in O is at least as large 
as the maximum index in B and therefore the largest container in the optimum 
solution is at least as large as the largest container in our packing. While B ^ 
we construct the assignment / iteratively as follows: 

jmax := ma.x{j\jeO} and j := m.m{j £ B \ ^ tj' < ■ 

j'&B:j'>j 

Set f{j) := jmax for all j & B with j > j; set O := O \ {jmax} and B :=B\{j \ 
j > jj- Notice that property (6) holds for jmax since, by (7), the container 
size tj^^^ is an upper bound on tj, for all j € B. Moreover, (7) is still fulfilled 
for the reduced sets B and O. This concludes the proof. □ 

For all i = 1, . . . , iF and j = 1, . . . , A:,, set dj := if j G Bi, and (Pj := 0, 
otherwise. We define an instance of the UFP by replacing each request i by ki 
copies with demands d\, . . . According to Lemma 3, there exists a solution 
to this instance of the UFP whose congestion is at most twice the minimum 
congestion for the underlying instance of the k-SFP with path capacities. 

Theorem 3. For the problem of minimizing congestion, a p-approximate solu- 
tion to the instance of the UFP described above yields a 2p-approximate solution 
to the underlying instance of the k-SFP with path capacities (size capacities). 

With respect to the running time of the resulting 2/9-approximation algo- 
rithms for the k-SFP with path capacities, the same remarks hold that we made 
in the last section after Theorem 2. 

We conclude with the discussion of a special case of the k-SFP with path 
capacities where, for each request i, each of its path capacities is a multiple of 
all of its smaller path capacities. This property holds, for example, if all path 
capacities are powers of 2. In this case, we can use the same loading strategy 
as above in order to obtain a /9-approximate solution to the underlying problem 
from any p-approximation algorithm for the UFP. Notice that there is an optimal 
solution to the underlying problem which uses the containers given by Bi. The 
argument is similar to that in the proof of Lemma 3. 
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Abstract. We present a randomized algorithm for finding maximum 
matchings in planar graphs in time 0(n“^^), where oj is the exponent 
of the best known matrix multiplication algorithm. Since lj < 2.38, this 
algorithm breaks through the 0{n}'^) barrier for the matching problem. 
This is the first result of this kind for general planar graphs. 

Our algorithm is based on the Gaussian elimination approach to maxi- 
mum matchings introduced in [1]. 



1 Introduction 

A matching of an undirected graph G = (F, E) is a subset MCE, such that 
no two edges in M are incident. Let n=|F|,m=|if|. A perfect matching is a 
matching of cardinality n/2. The problems of finding a Maximum Matching (i.e. a 
matching of maximum size) and, as a special case, finding a Perfect Matching if 
one exists, are two of the most fundamental algorithmic graph problems. 

Solving these problems in time polynomial in n remained an elusive goal for 
a long time until Edmonds [2] gave the first algorithm. Several other algorithms 
have been found since then, the fastest of them being the algorithm of Micali and 
Vazirani [3], Blum [4] and Gabow and Tarjan [5]. The first of these algorithms 
is in fact a modification of the Edmonds algorithm, the other two use different 
techniques, but all of them run in time 0{m^/n), which gives for dense 

graphs. 

The matching problems seem to be inherently easier for planar graphs. For 
a start, these graphs have 0(n) edges, so 0(m^/n) = But there is more 

to it. Using the duality-based reduction of maximum flow with multiple sources 
and sinks to single source shortest paths problem [6], Klein et al. [7] were able to 
give an algorithm finding perfect matchings in bipartite planar graphs in time 
0(n3 logn). This reduction, however, does not carry over to the case of general 
planar graphs. 

We have recently shown [1], that extending the randomized technique of 
Lovasz [8] leads to an 0(n“) algorithm for finding maximum matching in general 
graphs. In this paper we use similar techniques, together with separator based 

* Research supported by KBN grant 4T11C04425. 
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decomposition of planar graphs and the fast nested dissection algorithm, to show 
that maximum matchings in planar graphs can be found in time 

Remark 1. In case oj = 2 additional logarithmic factor appears, so in the re- 
mainder of this paper we assume for simplicity, that w > 2. 

There is one point to notice here. The 0{n^) algorithm for general graphs 
presented in [1] is faster than the standard maximum matching algorithms only 
if the Coppersmith-Winograd matrix multiplication is used (see [9]). On the 
other hand, for our algorithm to be faster than the standard algorithms 

applied to planar graphs, it is enough to use any o(n^) matrix multiplication 
algorithm, e.g. the classic algorithm of Strassen [10]. This suggests that our 
results not only constitute a theoretical breakthrough, but might also give a 
new practical approach to solving the maximum matching problem in planar 
graphs. 

The same techniques can be used to generate perfect matchings in planar 
graphs uniformly at random using arithmetic operations. This extends 

the result of Wilson [11]. Details will be given in the full version of this paper. 

The rest of the paper is organized as follows. In the next section we recall 
some well known results concerning the algebraic approach to the maximum 
matching problem and the key ideas from [1]. In section 3 we recall the sepa- 
rator theorem for planar graphs and the fast nested dissection algorithm and 
show how these can be used to test planar graphs for perfect matchings in time 
In Section 4, we present an algorithm for finding perfect matchings in 
planar graphs in time and in Section 5 we show how to extend it to an 

algorithm finding maximum matchings. In all these algorithms we use rational 
arithmetic and so their time complexity is in fact larger than 0{n^/'^). This is- 
sue is addressed in Section 6, where we show, that all the computations can be 
performed over a finite field Zp, for a random prime p = 0{n^). 

2 Preliminaries 

2.1 Matchings, Adjacency Matrices, and Their Inverses 

Let G = {V,E) be a graph and let n = |I^| and V = {ui,...,u„}. A skew 
symmetrie adjaceney matrix of G is a n x n matrix A(G) such that 

{ Xij if {vi, Vj) £ E and i < j 

—Xij if (vi,Vj) € E and i > j , 

0 otherwise 

where the Xij are unique variables corresponding to the edges of G. 

Tutte [12] observed the following 

Theorem 2. The symbolic determinant det A{G) is non-zero iff G has a perfect 
matching. 
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Lovasz[8] generalized this to 

Theorem 3. The rank of the skew symmetric adjacency matrix A{G) is equal 
to twice the size of maximum matching of G. 

Choose a number R = (more on the choice of R in Section 6) and 

substitute each variable in A{G) with a random number taken from the set 
i?}. Let us call the resulting matrix the random adjacency matrix of G 
and denote A{G). Lovasz showed that 

Theorem 4. The rank of A{G) is at most twice the size of maximum matching 
in G. The equality holds with probability at least 1 — (n/R). 

This gives a randomized algorithm for deciding whether a given graph G 
has a perfect matching: Compute the determinant of A{G). With high proba- 
bility, this determinant is non-zero iff G has a perfect matching. More precisely, 
if the determinant is non-zero then G has a perfect matching, but with small 
probability the determinant can be zero even if G has a perfect matching. This 
algorithm can be implemented to run in time 0(n“) using fast matrix multipli- 
cation (where oj is the matrix multiplication exponent, currently uj < 2.38, see 
Coppersmith and Winograd [9]). 

Let G be a graph having a perfect matching and let A = A{G) be its random 
adjacency matrix. With high probability det A^Q, and so A is invertible. Rabin 
and Vazirani [13] showed that 

Theorem 5. With high probability, (A~^)j^i ^ Q iff the graph G — {vi,Vj} has 
a perfect matching. 

In particular, if {vi, Vj) is an edge in G, then with high probability {A~^)j^i 
0 iff {vi,Vj) is allowed, i.e. it is contained in some perfect matching. This follows 
from the formula {A~^)ij = a,d]{A)ij / det A, where &d]{A)ij — the so called 
adjoint of A — is the determinant of A with the j-th row and f-th column 
removed, multiplied by (—1)®+-^ . 

2.2 Randomization 

Recall the classic lemma due to Zippel [14] and Schwartz [15] 

Lemma 6. If p{x\, ... ,Xm) is a non-zero polynomial of degree d with coeffi- 
cients in a field and S is a subset of the field, then the probability that p evaluates 
to 0 on a random element (si, S 2 , . . . , Sm) S S'™ is at most d/\S\ . 

Notice that the determinant of the symbolic adjacency matrices defined in the 
previous subsection is in fact a polynomial of degree n. Thus, if we substitute 
random numbers from the set {l,...,i?} for the variables appearing in these 
matrices, then the probability of getting a false zero is smaller than n/R. This 
is exactly Theorem 4. 

The Zippel-Schwartz Lemma can be used to argue that with high-probability 
no false zeros appear in our algorithms. We will provide the details in the full 
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version of this paper. Throughout the rest of this paper, we omit the “with high 
probability” phrase in most statements. 

Computations performed by our algorithms rely on the underlying field hav- 
ing characteristic zero. That is why we present them over the field of rationals. 
For simplicity of presentation, throughout the paper we assume that we can per- 
form rational arithmetic operations in constant time, which allows us to avoid 
differentiating between the time complexity and the number of arithmetic op- 
erations. In Section 6, we use the Zippel-Schwartz Lemma to show that our 
algorithms can be performed over Zp for large enough (polynomial) random 
prime p. 

2.3 Perfect Matchings via Gaussian Elimination 

We now recall a technique, recently developed by the authors, of finding perfect 
matchings using Gaussian elimination. This technique can be used to find an 
inclusion- wise maximal allowed submatching of any matching in time 0(n“’), 
which is a key element of our matching algorithm for planar graphs. A more 
detailed exposition of the Gaussian elimination technique and faster algorithm 
for matchings in bipartite and general graphs can be found in [1]. 

Gonsider a random adjacency matrix A = A{G) of a graph G = (V, E), where 
\V\ = n,V = {vi,V 2 , ■ . . ,Vn}- If (vi,Vj) € E and {A~^)ij ^ 0, then (vi,Vj) is an 
allowed edge. We may thus choose this edge as a matching edge and try to find 
a perfect matching in G' = G — {z;i, V 2 }- The problem with this approach is that 
edges that were allowed in G might not be allowed in G' . Gomputing the matrix 
A{G')~^ from scratch is out of the question as this would lead to a 0(n“+^) 
algorithm for perfect matchings. There is however another way suggested by the 
following well known property of Schur complement 

Theorem 7 (Elimination theorem). Let 



where oiq ^ 0. Then B ^ = B — uv'^ /ai,\. 

The modification of B described in this theorem is in fact a single step of the 
well-known Gaussian elimination procedure. In this case, we are eliminating the 
first variable (column) using the first equation (row) . Similarly, we can eliminate 
from A~^ any other variable (column) j using any equation (row) i, such that 



In [I] we show, that among consequences of Theorem 7 is a very simple 0{n^) 
algorithm for finding perfect matchings in general graphs (this is an easy corol- 
lary) as well as 0(n“) algorithms for finding perfect matchings in bipartite and 
general graphs. The last of these requires some additional structural techniques. 




k 0 . 
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2.4 Matching Verification 

We now describe another consequence of Theorem 7, one that is crucial for our 
approach to finding maximum matchings in planar graphs. In [1] we have shown 
that 

Theorem 8. Naive iterative Gaussian elimination without row or eolumn piv- 
oting can he implemented in time 0{n^) using a lazy updating scheme. 



Remark 9. The algorithm in Theorem 8 is very similar to the classic Hopcroft- 
Bunch algorithm [16]. Being iterative, our algorithm is better suited for proving 
Theorem 10, where we need to skip row-column pairs if the corresponding diag- 
onal element is zero. 

This algorithm has a very interesting application 

Theorem 10. Let G he a graph having a perfect matching. For any matching M 
ofG, an inclusion-wise maximal allowed (i.e. extendable to a perfect matching) 
submatching M' of M can he found in time 0(rN). 

Proof Let M = {('i;i,U2), {03,04), ..., {vk-i,Ok)} and let Vk+i,Vk+2, ■■■,Vn be 
the unmatched vertices. We compute the inverse A{G)~^ of the random adja- 
cency matrix of G and permute its rows and columns so that the row order is 
'Ci,'C2,'C3,U4, ■ . . and the column order is 02,04,04,03, ... ,Vn,Vn-i. Now, per- 
form Gaussian elimination of the first k rows and k columns using the algorithm 
of Theorem 8, but if the eliminated element is zero just skip to the next itera- 
tion. The eliminated rows and columns correspond to a maximal submatching 
M' of M. □ 

3 Testing Planar Graphs for Perfect Matching 

In this section we show how planar graphs can be tested for perfect matching in 
time 0(n“/^). We use the nested dissection algorithm that performs Gaussian 
elimination in time for a special class of matrices. The results presented 

in the next subsection are due to Lipton and Tarjan [17], and Lipton, Rose and 
Tarjan[18]. We follow the presentation in [19] as it is best suited for our purposes. 

3.1 Sparse LU Factorization via Nested Dissection 

We say that a graph G = {V,E) has a s{n) -separator family (with respect to 
some constant no) if either \V\ < no, or by deleting a set S of vertices such 
that [S’] < s(jVj), we may partition G into two disconnected subgraphs with the 
vertex sets Vi and V 2 , such that \Vi\ < 2/3]Vl, i= 1,2, and furthermore each of 
the two subgraphs of G defined by the vertex sets S' U 1^, i = 1,2 also has an 
s(n)-separator family. The set S in this definition is called an s{n) -separator in G 
(we also use the name small separators for 0(-y/n)-separators) and the partition 
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resulting from recursive application of this definition is the s{n) -separator tree. 
Partition of a subgraph of G defines its children in the tree. 

The following theorem of Lipton, Rose and Tarjan [17] gives an important 
example of graphs having 0(\/n)-separator families 

Theorem 11 (Separator theorem). Planar graphs have 0{y/n) -separator fa- 
milies. Moreover, an y/n-separator tree for a planar graph can be found in time 
0(n log n). 

Let A be an n X n symmetric positive definite matrix. The graph G{A) 
corresponding to A is defined as follows: G{A) = (V,E), V = {1,2, ...,n}, 
^ 7 ^ j Aij yf 0}. The existence of O(-yn) -separator family for 

G{A) makes faster Gaussian elimination possible as the following theorem of 
Lipton and Tarjan shows 

Theorem 12 (Nested dissection). Let A be a symmetric positive definite 
matrix and let G{A) have a 0{y/n) -separator family. Given a 0{y/n) separator 
tree for G{A), Gaussian elimination on A can be performed in time 
using the so-called nested dissection. The resulting LU factorization of A is 
given by matrices L and D, A = LDL"^ , where matrix L is unit lower-diagonal 
and has 0(n log n) non-zero entries and matrix D is diagonal. 

Remark 1 3. The assumption of A being symmetric positive definite is needed to 
assure that no diagonal zeros will appear. If we can guarantee this in some other 
way, then the assumption can be omitted. 

We are not going to present the details of this algorithm. The basic idea is 
to permute rows and columns of A using the -v/n-separator tree. Vertices of the 
top-level separator S correspond to the last [S'] rows and last [S'] columns, etc.. 
When Gaussian elimination is performed in this order, matrix remains sparse 
throughout the elimination. 

3.2 Vertex Splitting 

The nested dissection algorithm cannot be applied to the random adjacency ma- 
trix A = A(G) of a planar graph. This matrix is neither symmetric nor positive 
definite. Fortunately, factorization of A is not necessary. For our purposes it is 
enough to factorize the matrix B = AA^ . Notice that this matrix is symmet- 
ric and if A is non-singular (i.e. G has a perfect matching), then B is positive 
definite. 

We now need to show that G{B) and all its subgraphs have small separators, 
which needs not be true in general, but it is true if G is a bounded degree 
graph. It can be shown, using the standard technique of “vertex splitting” (see 
for example [11]) that it is sufficient to consider such graphs. 

Theorem 14. The problem of finding perfect (maximum) matchings in planar 
graphs is reducible in 0{n) time to the problem of finding perfect (maximum) 
matchings in planar graphs with maximum vertex degree 3. This reduction adds 
0{n) new vertices. 
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In order to use nested dissection on matrix B we have to show that the 
graph G{B) has 0(-v/n)-separator family. Let S' be a small separator in G{A), 
and consider the set T containing all vertices of S and all their neighbours. We 
call T a thick separator corresponding to S. Notice that B^ j can be non-zero 
only if there exists a path of length 2 between Vi and Vj . Thus T is a separator 
in G{B). T is also a small separator, because G has bounded degree and so 
< 4|S| = In the same manner small separators can be found in 

any subgraph of G{B), so LU factorization of B can be found using the nested 
dissection algorithm in time 0(tGI^'). 

3.3 The Testing Algorithm 

The general idea of the testing algorithm is presented in Fig. 1. If the nested 
dissection algorithm finds an LU factorization of B, then B is non-singular, and 
so A is non-singular, thus G has a perfect matching. If, however, the nested 
dissection fails i.e. there appears zero on the diagonal during the elimination, 
then B is not positive definite, and so A is singular. 



PLANAR-TEST-PERFECT-MATCHING(G): 

1. reduce the degrees of vertices in G; 

2. compute B = 

3. run nested dissection on B\ 

4. G has a perfect matching iff the algorithm succeeds i.e. finds an 
LU factorization; 



Fig. 1. An algorithm for testing if a planar graph has a perfect matching. 



4 Finding Perfect Matchings in Planar Graphs 

In this section we present an algorithm for finding perfect matchings in pla- 
nar graphs. In Section 5 we show that the more general problem of finding a 
maximum matching reduces to the problem of finding a perfect matching. 

4.1 The General Idea 

For any matrix X, let Xji c denote a submatrix of X corresponding to rows R 
and columns G. 

The general idea of the matching algorithm is presented in Fig. 2. 

To find a perfect matching in a planar graph, we find a small separator, 
match its vertices in an allowed way (i.e. one that can be extended to the set of 
all vertices), and then solve the problem for each of the connected components 
created by removing the endpoints of this matching. In the remainder of this 
section, we show that we can perform steps 3. and 4. in time 0(n“/^). This gives 
the complexity bound of for the whole algorithm as well. 
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PLANAR-PERFEGT-MATCHING(G): 

1. run PLANAR-TEST-PERFECT-MATGHING(G); 

2. let S be a small separator in G and let T be the corresponding 
thick separator; 

3. find (A(G)-1 )t.t; 

4. using the FIND-ALLOWED-SEPARATOR-MATGHING proce- 
dure, find an allowed matching M incident on all vertices of S'; 

5. find perfect matchings in connected components of G — V{M)\ 



Fig. 2. An algorithm for Hnding perfect matchings in planar graphs. 



4.2 Computing the Important Part of A{G) ^ 

We could easily find {A{G)~^)t,t if we had an LU factorization of A = A{G). 
Unfortunately, A is not symmetric positive definite, so we cannot use the fast 
nested dissection algorithm to factorize A. In the testing phase we find n x n 
matrices L and D such that = LDL"’" . We now show how L and D can 

be used to compute {A~^)t,t in time Let us represent A, L and D as 

block matrices 



A = 



^1,1 


^1,2 


^2,1 


^2,2 



L = 



Li,i 


0 


L2,1 


L2,2 



L-i = 



( ^1.1 




0 

1 


G>2,2 


1 






0 


^2,2^‘2A^1,1 


^2,2 



where lower right blocks in all matrices correspond to the vertices of the thick 
separator T, for example At^t = ^ 2 , 2 - Since AA^ = LDL"^ , we have 






{A^r^ = {L^r^i 

where the interesting part of is 

= {{L'^)~^)t,vD~^L~^Av,t = 

= D2\{L~^)Ty Av,t = 

= {Ll2)-^D2^2L2^2^2,2 + (^^, 2 ) ^ 2:2 ' ) 2 . 1 ^ 1 , 2 ■ 

The first component can be easily computed in time 0{n‘^^^) using fast matrix 
multiplication. The second component can be written as 



(^ 2 , 2 ) ^^2.2i^ ^)2,l^l,2 — —i.Iy^2) ^ ^2^^2^^2 ,iLi\Ai^2 

and the only hard part here is to compute X = —L 2 .\Li\Ai^ 2 - Consider the 
matrix 



B = 



il.l 


^1,2 


L 2 .I 


0 
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When Gaussian elimination is performed on the non-separator columns and ver- 
tices of B, the lower right submatrix becomes X. This is a well known property 
of the Schur complement. The elimination can be performed with use of the 
nested dissection algorithm in time The idea here is that the separator 

tree for AA"’" is a valid separator tree for L, thus also for B. The new non-zero 
entries of L introduced by Gaussian elimination (so called fill-in), correspond 
to the edges that can only go upwards in the separator tree, from child to one 
of its ancestors (see [20]). Notice, that since Lip is lower-diagonal, there are no 
problems with diagonal zeros, even though B is not symmetric positive definite. 

4.3 Matching the Separator Vertices 

We now show how the separator vertices can be matched using the matching 
verification algorithm. Gonsider the procedure presented in Fig. 3. 



FIND-ALLOWED-SEPARATOR-MATCHING: 

1. let M = 0; 

2. let Gt = {T, E{T) - E{T - S)); 

3. let Mg be any inclusion- wise maximal matching in Gt using only 
allowed edges; 

4. run the verification algorithm of Theorem 10 on {A~^)t,t to find 
a maximal allowed submatching M'q of Mg', 

5. add M'g to M; 

6. remove the vertices matched by M'g from Gt', 

7. mark edges in Mg — M'g as not allowed; 

8. if M does not match all vertices of S go to step 3.; 



Fig. 3. A Procedure for finding allowed submatching of the separator. 



The maximal allowed submatching M'q of Mq can be found in time 0{n^^^) 
using the verification algorithm. This algorithm requires the inverse A(G)~^ of 
the random adjacency matrix of G, but it never uses any values from outside the 
submatrix {A{G)~^)t.t corresponding to the vertices of T, so we only have to 
compute this submatrix. Let A be the result of running the verification algorithm 
on the matrix {A~^)t,t- Notice that due to Theorem 7, At',t' = {A~^)t',t', 
where T' is obtained from T by removing the vertices matched by Mq. Thus the 
inverse does not need to be computed from scratch in each iteration of the loop. 

Now, consider the allowed matching M covering S, found by the above al- 
gorithm. Notice that any edge e of M is either incident on at last one edge of 
the inclusion- wise maximal matching Mq or is contained in Mq, because of the 
maximality of Mq. If e is in Mq, it is chosen in step 4, otherwise one of the 
edges incident to e is marked as not allowed. Every edge e G M has at most 4 
incident edges, so the loop is executed at most 5 times and the whole procedure 
works in 
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5 Maximum Versus Perfect Matchings 

We now show that the problem of finding a maximum matching can be reduced in 
time to the problem of finding a perfect matching. The problem is to find 

the largest subset W C ]/, such that the induced G\W] has a perfect matching. 
Notice that this is equivalent to finding the largest subset W C V, such that 
^w.w is non-singular. The basic idea is to use the nested dissection algorithm. 
We first show that non-singular submatrices of AA"'" correspond to non-singular 
submatrices of A (note that Lemma 15, Theorem 17 and Theorem 18 are all well 
known facts). 

Lemma 15. The matrix AA^ has the same rank as A. 

Proof. We will prove that ker(A) = ker(yf^A). Let v be such that {A^A)v = 0. 
We have 

0 = v^{A^ A)v = {v^ AA){Av) = (Av)'^{Av), 
so Av = 0. □ 



Remark 16. Notice that this proof (and the proof of Lemma 19 later in this 
section) relies on the fact that vv^ = 0 iff u = 0. This is not true for finite fields 
and we will have to deal with this problem in the next section. 

We will also need the following classic theorem of Frobenius (see [21]) 

Theorem 17 (Ftobenius Theorem). Let A be an nxn skew-symmetrie ma- 
trix and let X,Y C n} such that jXj = |T| =rank(T). Then 

det(Ajc^x)det(Ay,y) = (-1)1^1 det ^(^x,v)- 

Now, we are ready to prove the following 

Theorem 18. If {AA^)w,w is non-singular and \W\ = rank(AA^), then Aw,w 
is also non-singular. 

Proof. We have {AA^)w,w = ^wyAf^w^ ramk{Awy) = rank(AA^). By 
Lemma 15, this is equal to rank(A). Let Aw,u be any square submatrix of 
Awy of maximal rank. From Frobenius Theorem it follows that also has 

maximal rank. □ 

The only question now is, whether AA"’' always has a submatrix {AA"^)w,w (be., 
a symmetrically placed submatrix) of maximal rank. There are many ways to 
prove this fact, but we use the one that leads to an algorithm for actually finding 
this submatrix. 

Lemma 19. If A is any matrix and {AA^)i^i = 0, then {AA^)ij = {AA^)j^i = 
0 for all j. 
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Proof. Let be the j-th unit vector. We have 

0 = = {efA){A'^e,) = {A^ {A^ e,) , 

so A'^Ci = 0. But then (AA'^)ij = {ej A){A^ei) = 0 for any j and the same for 
(AA^),,. ’ □ 



Theorem 20. A submatrix {AA^)w,w of AA^ of maximal rank always exists 
and can be found in time 0{n‘^^'^) using the nested dissection algorithm. 

Proof. We perform the nested dissection algorithm on the matrix AA"^ . At any 
stage of the computations, the matrix we are working on is of the form BB^ for 
some B. It follows from Lemma 19, that if a diagonal entry we want to eliminate 
has value zero, then the row and the column corresponding to this entry consist 
of only zeros. We ignore these and proceed with the elimination. The matrix 
{AA^) we are looking for consists of all non-ignored rows and columns. □ 



Corollary 21. For any planar graph G = (V, E) a largest subset W CV, such 
that G[W] has a perfect matching, can be found in time 

We have thus argued that for planar graphs the maximum matching problem 
can be reduced in time to the perfect matching problem. Since we can 

solve the latter in time we can solve the former in the same time. 

6 Working over a Finite Field 

So far, we have shown an algorithm finding a maximum matching in a planar 
graph using rational arithmetic operations. It can be shown that, with 

high probability, our algorithm gives the same results if performed using the 
finite field arithmetic Zp for a randomly chosen prime p = 0{A). Due to space 
limitations we defer these considerations to the full version of this paper. 
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Abstract. We generalize univariate multipoint evaluation of polynomi- 
als of degree n at sublinear amortized cost per point. More precisely, it 
is shown how to evaluate a bivariate polynomial p of maximum degree 
less than n, specified by its coefficients, simultaneously at given 

points using a total of arithmetic operations. In terms of the 

input size N being quadratic in n, this amounts to an amortized cost of 

C)(Ar0.334) pgj. 

1 Introduction 

By Horner’s Rule, any polynomial p of degree less than n can be evaluated at a 
given argument x in 0(n) arithmetic operations which is optimal for a generic 
polynomial as proved by Pan (1966), see for example Theorem 6.5 in Biirgisser, 
Clausen & Shokrollahi (1997). 

In order to evaluate p at several points, we might sequentially compute p{xk) 
for 0 < fc < n. However, regarding that both the input consisting of n coefficients 
of p and n points Xk and the output consisting of the n values p{xk) have 
only linear size, information theory provides no justification for this quadratic 
total running time. In fact, a more sophisticated algorithm permits to compute 
all p{xk) simultaneously using only 0{n ■ log^ n • log log n) operations. Based 
on the Fast Fourier Transform, the mentioned algorithms and others realize 
what is known as Fast Polynomial Arithmetic. For ease of notation, we use the 
‘soft-Oh’ notation, namely 0'^{f(n)) := O (/(n)(log/(n))®*^^^). This variant 
of the usual asymptotic ‘big-Oh’ notation ignores poly-logarithmic factors like 
log^ n ■ log log n. 

Fact 1. Let R be a commutative ring with one. 

(i) Multiplication of univariate polynomials; Suppose we are given polynomials 
p,q G of degree less than n, specified by their coefficients. Then we can 

compute the coefficients of the product polynomial p-q G R\X] using O'^ (n) 
arithmetic operations in R. 

* Supported by the DFG Research Training Group GK-693 of the Paderborn Institute 
for Scientific Computation (PaSCo) 

S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 544-555, 2004. 

© Springer-Verlag Berlin Heidelberg 2004 



Fast Multipoint Evaluation of Bivariate Polynomials 545 



(a) Multipoint evaluation of a univariate polynomial; Suppose we are given a 
polynomial p G R[X] of degree less than n, again specified by its coef- 
ficients, and points xq, . . . , Xn-i € R. Then we can compute the values 
p{xq), . . . ,p{Xn-i) G R using 0"'{n) arithmetic operations in R. 

(in) Univariate interpolation; Conversely, suppose we are given points (xk, Pk) G 
R? for 0 < k < n such that Xk — xe is invertible in R for all k ^ £. Then we 
can compute the coefficients of a polynomial p G -R[^] of degree less than 
n such that p{xk) = Vj, b < k < n, that is, determine the interpolation 
polynomial to data (xk,yk) using 0'^(n) arithmetic operations in R. 

Proof. These results can be found for example in von zur Gathen & Gerhard 
(2003) including small constants: 

(i) can be done using at most 63.427-ndog2 ndog 2 log 2 n-\-0{n\ogn) arithmetic 
operations in R by Theorem 8.23. The essential ingredient is the Fast Fourier 
Transform. If i? = C then even |nlog 2 n-\-0{n) arithmetic operations suffice. 
This goes back to Schonhage & Strassen (1971) and Schonhage (1977). 

In the following M(n) denotes the cost of one multiplication of univariate poly- 
nomials over R of degree less then n . 

(ii) can be done using at most ^ M (n) log 2 n-\-0{n log n) operations in R accord- 
ing to Gorollary 10.8. Here, Divide & Gonquer provides the final building 
block. This goes back to Fiduccia (1972). 

(hi) can be done using at most ^M(n) log 2 u -|- O(nlogn) operations in R ac- 
cording to Gorollary 10.12. This, too, is completed by Divide & Gonquer. 
The result goes back to Horowitz (1972). 

You also find an excellent account of all these in Borodin & Munro (1975). □ 

Fast polynomial arithmetic and in particular multipoint evaluation has found 
many applications in algorithmic number theory (see for example Odlyzko & 
Schonhage 1988), computer aided geometric design (see for example Lodha & 
Goldman 1997), and computational physics (see for example Ziegler ????b). 

Observe that the above claims apply to the univariate case. What about 
multivariate analogues? Let us for a start consider the bivariate case: A bivariate 
polynomial p G R[X,Y\ of maximum degree maxdegp := max {deg;s(- p, degy p} 
less than n has up to coefficients, one for each monomial with 0 < 

i,j < n. Now corresponding to Fact 1, the following questions emerge: 

Question 2. (i) Multiplication of bivariate polynomials; Can two given bivari- 
ate polynomials of maximum degree less than n be multiplied within time 
)? 

(ii) Multipoint evaluation of a bivariate polynomial; Can a given bivariate poly- 
nomial of maximum degree less than n he evaluated simultaneously at 
arguments in time 0"'(n^)? 

(in) Bivariate interpolation; Civen points (xk, yk, Zk) G R^, is there a polyno- 
mial p G R[X,Y\ of maximum degree less than n such that p(xk,yk) = Zk 
for all 0 < k < n^? And, if yes, can we compute it in time 0"^(n^)? 
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Such issues also arise for instance in connection with fast arithmetic for polyno- 
mials over the skew-field of hypercomplex numbers (Ziegler ????a, Section 3.1). 

A positive answer to Question 2(i) is achieved by embedding p and q into 
univariate polynomials of degree 0{v?) using the Kronecker substitution Y = 
applying Fact l(i) to them, and then re-substituting the result to a bi- 
variate polynomial; see for example Corollary 8.28 in von zur Gathen & Gerhard 
(2003) or Section 1.8 in Bini & Pan (1994). 

Note that the first part of (iii) has negative answer for instance whenever the 
points (xk,yk) are co-linear or, more generally, lie on a curve of small degree: 
Here, a bivariate polynomial of maximum degree less than n does not even exist 
in general. 

Addressing (ii), observe that Kronecker substitution is not compatible with 
evaluation and thus of no direct use for reducing to the univariate case. The 
methods that yield Fact l(ii) are not applicable either as they rely on fast poly- 
nomial division with remainder which looses many of its nice mathematical and 
computational properties when passing from the univariate to the bivariate case. 

Nevertheless, (ii) does admit a rather immediate positive answer provided the 
arguments (xk,yk), 0 < k < form a Cartesian n x n-grid (also called tensor 
product grid). Indeed, consider p{X,Y) = ^ polynomial in 

Y with coefficients qj being univariate polynomials in X. Then multi-evaluate qj 
at the n distinct values Xk'- as qj has degree less than n, this takes time O'^(n) 
for each j, adding to a total of ©"^(n^). Finally take the n different univariate 
polynomials p{xk, Y) in Y of degree less than n and multi-evaluate each at the 
n distinct values yf. this takes another O'^(n^). 

The presumption on the arguments to form a Cartesian grid allows for a 
slight relaxation in that this grid may be rotated and sheared: Such kind of 



Fig. 1. Cartesian 8 x 8-grid, same rotated and sheared; 64 generic points. 



affine distortion is easy to detect, reverted to the arguments, and then instead 
applied to the polynomial p by transforming its coefficients within time 
see Lemma 14 below. The obtained polynomial p can then be evaluated on the 
now strictly Cartesian grid as described above. However, nxn grids, even rotated 
and sheared ones, form only a zero-set within the 2n^-dimensional space of all 
possible configurations of points. Thus this is a severe restriction. 
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2 Goal and Idea 

The big open question and goal of the present work is concerned with fast mul- 
tipoint evaluation of a multivariate polynomial. As a first step in this direction 
we consider the bivariate case. 

The naive approach to this problem, namely of sequentially calculating all 
p{xk, Uk), takes quadratic time each, thus inferring total cost of order n^. A first 
improvement to is based on the simple observation that any n points 

in the plane can easily be extended to an n x n grid on which, by the above 
considerations, multipoint evaluation of p is feasible in time So we 

may partition the n? arguments into n blocks of n points and multi-evaluate p 
sequentially on each of them to obtain the following 

Theorem 3. Let R be a commutative ring with one. A bivariate polynomial 
p G R[X,Y] of degx(p) < n and degy(p) < n, given by its coefficients, can be 
evaluated simultaneously at given arguments {xk,yk) using at most 0{n^ ■ 
log^ n ■ log log n) arithmetic operations in R. 

We reduce this softly cubic upper complexity bound to More pre- 

cisely, by combining fast univariate polynomial arithmetic with fast matrix mul- 
tiplication we will prove: 

Result 4. Let K denote an arbitrary held. A bivariate polynomial p G K[A, Y] 
of degxip) < n and degy(p) < to, specihed by its coefficients, can be evaluated 
simultaneously at N given arguments (xk,yk) G with pairwise different hrst 
coordinates using O [{N + arithmetic operations in K for any 

hxed £ > 0. 

Here, 0 J 2 denotes the exponent of the multiplication of n x n- by rectangular 
n X n^-matrices, see Section 3. In fact this problem is well-known to admit a 
much faster solution than naive 0{n^), the current world record uj 2 < 3.334 
being due to Huang & Pan (1998). By choosing m = n and N = n^, this yields 
the running time claimed in the abstract. 

The general idea underlying Result 4, illustrated for the case of n = to, 
is to reduce the bivariate to the univariate case by substituting Y in p{X,Y) 
with the interpolation polynomial g{X) of degree less than v? to data (xk,yk). 
It then suffices to multi-evaluate the univariate result p[X, g{X)) at the 
arguments Xk. Obviously, this can only work if such an interpolation polynomial 
g is available, that is any two evaluation points (xk, yk) ^ (xk' , yk') differ in their 
first coordinates, Xk ^ Xk'. However, this condition can be asserted easily later 
on, see Section 6, so for now assume it is fulfilled. 

This naiVe substitution leads to a polynomial of degree up to 0{n^). On the 
other hand, it obviously suffices to obtain p(A, g(A)) modulo the polynomial 
f{X) := rio<fe<n 2 (^ ~ which has degree less than The key to efficient 
bivariate multipoint evaluation is thus an efficient algorithm for this modular 
hi-to-univariate composition problem, presented in Theorem 9. 
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As we make heavy use of fast matrix multiplication, Section 3 recalls some 
basic facts, observations, and the state of the art in that field of research. Sec- 
tion 4 formally states the main result of the present work together with two 
tools (affine substitution and modular composition) which might be interesting 
on their own, their proofs being postponed to Section 5. Section 6 describes three 
ways to deal with arguments that do have coinciding first coordinates. Section 7 
gives some final remarks. 



3 Basics on Fast Matrix Multiplication 

Recall that, for a field K, a; = o;(K) > 2 denotes the exponent of matrix multipli- 
cation, that is, the least real such that to x to matrix multiplication is feasible in 
asymptotic time 0 (to“+®) for any e > 0; see for example Chapter 15 in Biirgisser 
et al. (1997). The current world-record due to Coppersmith & Winograd (1990) 
achieves uj < 2.376 independent of the ground field K. The Notes 12.1 in von zur 
Gathen & Gerhard (2003) contain a short historical account. 

Clearly, a rectangular matrix multiplication of, say, to x m-matrices by to x 
TO*-matrices can always be done partitioning into to x to square matrices. Yet, 
in some cases there are better known algorithms than this. We use the nota- 
tion introduced by Huang & Pan (1998): uj{r,s,t) denotes the exponent of the 
multiplication of [to’'] x [to®]- by [to®] x [to*] - matrices, that is 

Multiplication of [to’'] x [to®] - by 
[to®] X [to*] - matrices can be done 
with 0{nri^) arithmetic operations 

Clearly, uj = u;(l, 1, 1). We always have 

maxjr -I- s, r -I- 1, s -I- 1} < uj{r,s,t) < r-\-s-\-t. (5) 

Note that co(r, s, t) is in fact invariant under permutation of its arguments. 

We collect some known bounds on fast matrix multiplication algorithms. 

Fact 6. (i) u = uj{l, 1, 1) < log2(7) < 2.8073549221 (Strassen 1969). 

(a) io = (J(l, 1, 1) < 2.3754769128 (Coppersmith Winograd 1990). 

(Hi) i 02 := w(l, 1, 2) < 3.3339532438 (Huang k Pan 1998). 

Partitioning into square matrices only yields uj 2 < uj 1 < 3.3754769128. 
Bounds for further rectangular matrix multiplications can be also be found 
in Huang & Pan (1998). It is conjectured that a; = 2. Then by partition- 
ing into square blocks also tn(r, s,t) touches its lower bound in (5), that is 
w(r, s, t) = max {r -|- s, r -|- t, s -I- t}. In particular, o ;2 = 3 then. 

We point out that the definition of to and uj(r,s,t) refers to arbitrary alge- 
braic computations which furthermore may be non-uniform, that is, use for each 
matrix size to a different algorithm. However, closer inspection of Section 15.1 
in Biirgisser et al. (1997) reveals the following 



oj{r, s, t) = inf < t G 
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Observation 7. Rectangular matrix multiplication of [m’'] x [to®] - by [to®] x 
[to*] - matrices over K can be done with 0(to“*^’'’®’*)+®) arithmetic operations in 
K b j a uniform, bilinear algorithm for any fixed e. 

A bilinear computation is a very special kind of algorithm where apart from 
additions and scalar multiplications only bilinear multiplications occur; see for 
example Definition 14.7 in Biirgisser et al. (1997) for more details. In particular, 
no divisions are allowed. 



4 Main Results 



Our major contribution concerns bivariate multi-evaluation at arguments (xk, yk) 
under the condition that their first coordinates Xk are pairwise distinct. This 
amounts to a weakened general position presumption as is common for instance 
in Computational Geometry. 

For notational convenience, we define ‘0~’ (smooth-Oh) which, in addition 
to polylogarithmic factors in n, also ignores factors n® as long as e > 0 can 
be chosen arbitrarily small. Formally, 0~{f(n)) := [^£>0 ^ Note that 
0-(/(n)) C 0-(/(n)). 



Theorem 8. Let IK denote a held. Suppose n, to € N. Given the nm coefficients 
of a bivariate polynomial p with degx(p) < n and degy(p) < to and given nm 
points {xk, yk) € 0 < k < nm such that the first coordinates Xk are pairwise 

different, we can calculate the n values p{xk,yk) using 0~ (uto^^/^) arithmetic 
operations over K. The algorithm is uniform. 

Observe that this yields the first part of Result 4 by performing \N/{nm)~\ 
separate multipoint evaluations at nm points each. Let us also remark that 
any further progress in matrix multiplication immediately carries over to our 
problem. As it is conjectured that oj = 2 holds, this would lead to bivariate 
multipoint evaluation within time 0~{nm^'^). 

Our proof of Theorem 8 is based on the following generalization of Brent & 
Kungs efficient univariate modular composition, see for example Section 12.2 in 
von zur Gathen & Gerhard (2003), to a certain ^hi-to-univariate' variant: 



Theorem 9. Fix a held K. Given n, to G N, a bivariate polynomial p G K[AI, Y] 
with degjj-(p) < n and degy(p) < to and univariate polynomials g,f£ K[A] of 
degree less than nm, specified by their coefficients. Then p{X,g{X)') rem/(A) 
can be computed with 0~(nm‘^^/^) arithmetic operations in K. 

We remark that true bivariate modular computation requires Grobner basis 
methods which for complexity reasons are beyond our interest here. 
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5 Proofs 

Now we come to the proofs. 

Lemma 10. Let K denote a field and fix t > 0. 

(i) Let both A be an m x m-matrix and B an m x m* -matrix whose entries 

consist of polynomials aij{X),bij{X) G K[X] of degree less than n. Given 
m and the n ■ {m? + m‘) coefficients, we can compute the coefficients of the 
polynomial entries Cij(X) of C := A ■ B within arithmetic 

operations. 

(a) If A denotes an m x m square matrix with polynomial entries of degree less 
than n and b denotes an m-component vector of polynomials of degree less 
than nmf, then {A, b) A ■ b is computable within 

(in) Let pq, . . . ,Pm-i G K[X, Y] denote bivariate polynomials with degxiPi) < n 
and degy(pi) < m, given their nm'^ coefficients, and let furthermore uni- 
variate polynomials g, f G K[X] of degree less than nml he given by their 
coefficients. Then the coefficients of the m univariate polynomials 

Pi{X,g(X)) rem/(X) 

can be computed with arithmetic operations. 

In particular, for t = 1 we have cost 0~{nm‘^) C and for t = 2 we 

have cost 0~{nm‘^^) C O'^ 

Proof, (i) By scalar extension to i? = K[X] we obtain an algorithm with cost 
arithmetic operations in R using Observation 7. For the algo- 
rithm scalar extension simply means that we perform any multiplication in 
R instead of K, multiplications with constants become scalar multiplications. 
And the cost for one operation in R is (n) as only polynomials of degree 
n have to be multiplied. 

(ii) For each j, 0 < j < m, decompose the polynomial bj of degree less than 
nmf into ml polynomials of degree less than n, that is, write bj{X) = 
J2o<k<m* ^jk(^) ■ The desired polynomial vector is then given by 

{A-b)^{X)= E bMX)-X^-) 

l<j<m 0<fc<m* ^ ^ 

= i; {A-B)JX).X-' 

0<k<m* 

where 0 < i < m and B := {bjk) denotes an m x m* matrix of polynomials 
of degree less than n. The product A ■ B can be computed according to (i) in 
the claimed running time. Multiplication by amounts to mere coefficient 
shifts rather than arithmetic operations. And observing that deg (^{A-B)ik) < 
2n, only two consecutive terms in the right hand side of (*) can overlap. So 
evaluating this sum amounts to m*-fold addition of pairs of polynomials of 
degree less than n. Since w(l, l,t) > 1 -|- t by virtue of (5), this last cost of 
is also covered by the claimed complexity bound. 
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(iii) Write each pi as a polynomial in Y with coefficients from K[X], that is 

k(x,f)= ^ 

Q<j<m 



with all qij(X) of degree less than n. Iteratively compute the m poly- 
nomials gj{X) := g^(X) rem f{X), each of degree less than nm*, within 
time by fast division with remainder (see for example Theo- 

rem 9.6 in von zur Gathen & Gerhard 2003). 

By multiplying the matrix A := {qij) to the vector b := (gj) according to 
(ii), determine the m polynomials 

Pi{X) ■= ■ 9j{X), 0<i <m 

0<j<m 

of degree less than n + nm*. For each i reduce again modulo f{X) and 
obtain pi(^X, g{X)^ rem f{X) using another operations. Since 

> l-|-t according to (5), both parts are covered by the claimed 
running time □ 

Lemma 10 puts us in position to prove Theorem 9. 

Proof (Theorem 9). Without loss of generality we assume that m is a square. 
We use a baby step, giant step strategy: Partition p into yPm polynomials pi of 
degyiPi) < \/m, that is 

p{X,Y) = MX,Y)-Y^^ . 

0<i<y/rn 

Then apply Lemma lO(iii) with t = 2 and m replaced by y/m to obtain the y/m 
polynomials Pi{X) := pi(^X,g{X)) rem f{X) within operations. 

Iteratively determine the polynomials gi{X) := (g{X)'^Y rem f{X) for 
0 < f < -/m within 0'^{nm^^'^). Again, W 2 > 3 asserts this to remain in the 
claimed bound. Finally compute 

p{X,g{X))remf{X) = ^ (^pi{X) ■ g,{X)^ rem f{X) 

using another time □ 

Based on Theorem 9, the following algorithm realizes the idea expressed in 
Section 2. 

Algorithm 11. Generic multipoint evaluation of a bivariate polynomial. 

Input: Goefficients of a polynomial p G K[A, T] of degjjf(p) < n, degy(p) < m 
and points (xk,yk) for 0 < fc < nm with pairwise different first coordi- 
nates Xfc. 
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Output: The values p{xk,yk) for 0 < k < nm. 

1. Compute the univariate polynomial f{X) := {X — Xk) € K[X]. 

0<fc<nm 

2. Compute an interpolation polynomial g G K[X] of degree less than nm 
satisfying g(xk) = yk for all 0 < fc < nm. 

3. Apply Theorem 9 to obtain p{X) := p{X,g{X)^ rem f{X). 

4. Multi-evaluate this univariate polynomial p G K[A] of degree less than nm 
at the nm arguments Xk- 

5. Return {p{xk)) 0</c<nm ■ 



Proof (Theorem 8). The algorithm is correct by construction. 

Step 1 in Algorithm 11 can be done in 0'^{nm) arithmetic operations. As the 
points (xk, yk) have pairwise different first coordinates, the interpolation problem 
in Step 2 is solvable and, by virtue of Fact l(iii), in running time 0'^(nm). For 
Step 3 Theorem 9 guarantees running time 0~{nm^^/^). According to Fact 1(h), 
Step 4 is possible within time 0'^{nm). Summing up, we obtain the claimed 
running time. □ 



6 Evaluating at Degenerate Points 

Here we indicate how certain fields K permit to remove the condition on the 
evaluation point set imposed in Theorem 8. The idea is to rotate or shear the 
situation slightly, so that afterwards the point set has pairwise different first 
coordinates. To this end choose 0 G K arbitrary such that 

#{xk+0yk\O<k<N} = N (12) 

where N := nm denotes the number of points. Then replace each (xk,yk) by 
{x'^,y'k) ■■= {Xk + 6yk,yk) and the polynomial p by p(A, H) := p(A-0T,y). This 
can be done with O'^ {n^ + m?) arithmetic operations, see the more general 
Lemma 14 below. In any case a perturbation like this might even be a good 
idea if there are points whose first coordinates are ‘almost equal’ for reasons of 
numerical stability. 

Lemma 13. Let K denote a field and P = ({xk,yk) G | 0 < A: < iV} a col- 
lection of N planar points. 

(i)If > iV^, then 0 G K chosen uniformly at random satisfies (12) with 
probability at least i. Using 0{logN) guesses and a total of 0{N ■ log^ N) 
operations, we can thus find an appropriate 9 with high probability. 

If K is even infinite, a single guess almost certainly suffices. 

(a ) In case K = R or K = C, we can deterministically find an appropriate 9 in 
time 0{N ■ log TV). 

(in ) For a fixed proper extension field L of K, any 0 G L \ K will do. 
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Applying (i) or (ii) together with Lemma 14 affects the running time of Theo- 
rem 8 only by the possible change in the F-degree. Using (iii) means that all 
subsequent computations must be performed in L. This increases all further costs 
by no more than an additional constant factor depending on the degree [L : K] 
only. 

Proof, (i) Observe that an undesirable 9 with Xk + dyk = Xk' + Oyk' implies 
Uk = Vk' or 0 = In the latter case, 9 is thus uniquely determined by 

{k,k'}. Since there are at most ('^) < /2 such choices {k,k'}, no more 

than half of the ffK. > possible values of 9 can be undesirable. 

(ii) If K = R choose 0 > 0 such that 0- (j/max — 2 /min) < min {xk — Xw \ Xk > Xk'}- 
Such a value 0 can be found in linear time after sorting the points with 
respect to their x-coordinate. 

In case K = C, we can do the same with respect to the real parts. 

(iii) Simply observe that 1 and 0 are linearly independent. □ 

We now state the already announced 

Lemma 14. Let R be a commutative ring with one. Given n G N and the nf 
coefEcients of a polynomial p{X, Y) G R[X, Y] of degree less than n in both 
X and Y. Given furthermore a matrix A G R^^^ and a vector b G R^. From 
this, we can compute the coefEcients of the afEnely transformed polynomial 
p{aiiX + ai 2 Y + bi,a 2 iX + U 22 Y + 62 ) using 0{nf ■ log^ n ■ log log n) or O'" in?) 
arithmetic operations over R. 

In the special case R = C we can decrease the running time to 0{n^ logn). 

Lemma 14 straight-forwardly generalizes to d- variate polynomials and d-dimen- 
sional affine transformations being applicable within time 0 '^(n'^) for fixed d. 

Proof. We prove this in several steps. 

— First we note that, over any commutative ring S with one, we can compute 

the Taylor shift p(X -|- a) of a polynomial p G S'[X] of degree less than n by 
an element a £ S using 0{n ■ log^ n ■ log log n) arithmetic operations in S . 
There are many solutions for computing the Taylor shift of a polynomial. 
We would like to sketch the divide and conquer solution from Fact 2.1(iv) in 
von zur Gathen (1990) that works over any ring S: Precompute all powers 
{X -F a)^* for 0 < i < := [log 2 n\. Then recursively split p{X) = po(A) -F 

X'^'' pi{X) with degpo < 2^ and calculate p{X -F a) = pa{X -F a) -F (X -F 
aY" Pi{X + a). This amounts to 0{n ■ log^ n • log logn) multiplications in 
S and O(nlogn) other operations. So we achieve this over any ring S with 
0{n ■ log^ n ■ log log n) operations. 

— Next let S = R\Y]. Then we can use the previous to compute p{X -F a, Y) 
or p{X -F aY, Y) for a polynomial p G R\X, Y] = S'[A] of maximum degree 
less than n and an element a £ R. Using Kronecker substitution for the 
multiplications in R[X,Y] this can be done with 0{n^ ■ log^ n • log logn) 
arithmetic operations in R. 
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— Now we prove the assertion. Scaling is easy: p{x, y) p{ax, y) obviously 
works within 0{n^) steps. Use this and the discussed shifts once or twice. 

The solution to Problem 2.6 in Bini & Pan (1994) allows to save a factor logn • 
log log n when R = S = C. □ 



7 Conclusion and Further Questions 

We lowered the upper complexity bound for multi-evaluating dense bivariate 
polynomials of degree less than n with coefficients at points with pair- 
wise different first coordinates from naive O(n^) and to The 

algorithm is based on fast univariate polynomial arithmetic together with fast 
matrix multiplication and will immediately benefit from any future improvement 
of the latter. 

With the same technique, evaluation of a trivariate polynomial of maximum 
degree less than n at points can be accelerated from naive 0{n^) to 

Regarding that the matrix multiplication method of Huang & Pan (1998) has 
huge constants hidden in the big-Oh notation, it might in practice be preferable 
to use either the naive 2m^ or Strassen’s algorithm (with some tricks). 

Applying them to our approach still yields bivariate multipoint evaluation within 
time 0{n^) or 0(n^ ®^), respectively, with small big-Oh constants and no hidden 
factors logn in the leading term, that is, faster than Theorem 3. 

Further questions to consider are: 

— Is it possible to remove even the divisions? This would give a much more 
stable algorithm and it would also work over many rings. 

— As w > 2, the above techniques will never get below running times of order 
n?-^. Can we achieve an upper complexity bound as close as O'^(n^) to the 
information theoretic lower bound? 

— Can multipoint evaluation of trivariate polynomials p(Xi, X 2 , X^) be per- 
formed in time 0 ( 71 “^)? 

Acknowledgements. The authors wish to thank David Eppstein (2004) for 
an inspiring suggestion that finally led to Lemma 13(ii). 
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Abstract. This paper considers integer sorting on a RAM. We show 
that adaptive sorting of a sequence with qn inversions is asymptotically 
equivalent to multisorting groups of at most q keys, and a total of n 
keys. Using the recent 0{ny/\og logn) expected time sorting of Han and 
Thorup on each set, we immediately get an adaptive expected sorting 
time of 0{n\/\og log q). Interestingly, for any positive constant e, we show 
that multisorting and adaptive inversion sorting can be performed in 
linear time if g < 2^*°® . We also show how to asymptotically improve 

the running time of any traditional sorting algorithm on a class of inputs 
much broader than those with few inversions. 



1 Introduction 

Sorting is one of the most researched problems in computer science, both due to 
the fundamental nature of the problem and due to its practical importance. An 
important extension to basic sorting algorithms is adaptive sorting algorithms, 
which take advantage of any existing “order” in the input sequence to achieve 
faster sorting. This is a classical example of the trend in modern computer science 
to express the complexity of an algorithm not only in terms of the size of the 
problem instance, but also in terms of properties of the problem instance. On 
the practical side, it has been observed that many real-life instances of sorting 
problems do indeed have structure (see, e.g., [12, p. 54] or [13, p. 126]), and this 
can be exploited for faster sorting. 

A classical measure of order, or rather disorder, is the number of inversions 
in the sequence, that is, the number of pairs of elements that are placed in wrong 
order compared to the sorted sequence. Using finger search trees one can obtain 
a sorting algorithm that sorts a sequence of n elements with qn inversions in 
time 0{nlogq), and there is also a lower bound of f2{nlogq) on the number of 
comparisons needed to sort such a sequence in the worst case [8] . 

After Fredman and Willard’s seminal work [6], there has been an extensive in- 
terest in algorithms that exploit features of real computers to break comparison- 
based lower bounds for the case of integer keys stored in single words assuming 
unit cost operations on words. The model used for such studies is the so-called 
word RAM. The main idea, which goes back to classical techniques such as radix 
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sort and hashing, is to exploit that one can work with the representation of inte- 
gers. In combination with tabulation and word-level parallelism techniques this 
has lead to asymptotically very fast sorting algorithms. The currently fastest 
deterministic and randomized sorting algorithms use O(nloglogn) time and 
0{ny/log logn) expected time, respectively, and linear space [9,10]. 

The basic research question in this paper is how well we can exploit few inver- 
sions in integer sorting. The most obvious way forward is to use previous reduc- 
tions from inversion sort to finger searching, but for integer keys, this approach 
has a matching upper- and lower-bound of 0(n-^log 9/ log log q) time [3,4]. How- 
ever, in this paper, we will get down to time O(nloglogg) and O{n^/\oglo^) 
expected time, matching the best bounds for regular sorting. Also, more surpris- 
ingly, we will show that we can sort in linear time if for some constant e: > 0, we 
have q < 2d°s 

Our results are achieved via reduction to what we call multisorting, which 
is the problem of sorting several groups of keys with some bound on the size of 
a group. The output is a list for each group of the keys of the group in sorted 
order. 

To appreciate the power of multisorting, suppose the total number of keys 
over all groups is n and that all the keys have values in [n]. We can now prefix 
each key with the group it belongs to so that the values become pairs (f, x) where 
i is the group and x is the original value. Using an array over [n], we can sort the 
keys with the new values in linear time with a 2-pass radix sort. Each group now 
has its keys sorted in a segment of the whole list, and these segments are easily 
extracted. Thus, the whole multisort is performed in linear time. If, however, the 
group size q is much smaller than n, say, q = we would not know how to 

sort the groups individually in linear time. This power of multisorting has been 
used for regular (non-adaptive) sorting in [9]. 

We show in Section 3 that inversion sorting and multisorting are equivalent 
in a strong sense, namely, that we can multisort using only one inversion sorting 
and linear extra time and the other way around. 

Theorem 1. The following problems can he reduced to each other in 0{n) time 
and space on a RAM with word size w: 

— Multisorting: Sort n w-bit integer keys in groups, where each group consists 

of at most q keys. 

— Inversion sorting: Sort a list of n w-bit integer keys with a known upper 

bound qn on the number of inversions. 

The reduction of inversion sorting to multisorting assumes an upper bound 
qn on the number of inversions to be known. However, it is easy to make the 
algorithm automatically adapt to the number of inversions, with only a constant 
factor increase in running time. The idea is well known: Perform doubling search 
for the running time of the inversion sorting algorithm. For a precise statement 
of this reduction see Section 2. 

Multisorting can of course be done by sorting each group separately. If a 
group has less than q keys, the recent randomized sorting algorithm by Han 
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and Thorup [10] sorts it in 0(v^log log q) expected time per key. Thus, for mul- 
tisorting n keys in groups of size at most q, we get an algorithm running in 
0{ny/log log q) expected time. More interestingly, in Section 4, we show that we 
efficiently can multisort much larger groups than we can solve individually. In 
particular, for any positive constant e, we show that we can multisort in linear 
time with group size up to . Note that this is close to best possible in 

that if e was 0, this would be general linear time sorting. On the other hand, if 
the groups are to be sorted individually, the largest group size we can handle in 
linear time is only ^ ' using the signature sorting of Andersson et al. [2]. 

Our efficient new multisorting result is as follows. 

Theorem 2. On a RAM, we can multisort n integer keys in groups of size at 
most q in linear space andO(n(loglogn)/(l+loglogn — loglogq)) expected time. 
In particular, this is linear time for q < , for any constant e > 0. 

Combining with Theorem 1, we get the following adaptive sorting result: 

Corollary 1. We can sort n keys with qn inversions in linear space and 
0(n(loglogn)/(l+loglogn— loglogg)) expected time. In particular, this is linear 
time for q < 2*-^°®"^ for any constant e > 0. 

1.1 Our Overlooked Reduction 

In a slightly weaker form (c.f. Lemma 1), our reduction from inversion sorting to 
multisorting is quite simple and works on a comparison-based pointer-machine 
model of computation. We find it quite surprising that it has been overlooked for 
so long. Of course, in the classical comparison based model, from a theoretical 
perspective, the reduction to finger searching is optimal in that it gets down to 
0(n log q) time for qn inversions. 

From a practical perspective, however, a finger search data structure with 
all its parent, children, and horizontal pointers is pretty slow with a large space 
overhead. With our reduction, we partition our data set arbitrarily into groups 
of some limited size, that we sort with our favorite tuned sorting algorithm, be 
it quick sort, merge sort, or even insertion sort if the groups are small enough. 

1.2 Other Measures of Disorder 

It is folklore knowledge that if a sequence has r runs, that is, r increasing and 
decreasing subsequences, then one can sort in time 0(nf(r)), where /(r) is the 
time per operation on a priority queue with r keys. This time bound is incompa- 
rable to bounds in terms of the number of inversions. However, in Section 5 we 
consider the disorder measure Osc [II] which is a natural common generalization 
of both inversions and runs. Based on any non-adaptive sorting algorithm, we 
get a sorting algorithm that is extra efficient on inputs with low Osc disorder. 
In particular, the obtained algorithm will perform well on inputs with a small 
number of inversions or runs. 
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2 Eliminating the Need for a Bound on Inversion 
Number 

From Theorem 1 it follows that if we can multisort n keys with group size at most 
q in time S{n, q), then we can sort n keys with up to qn inversions in 0{S{n, q)) 
time. Within this time-bound, we can also output an error if we have problems 
because the number of inversions is bigger than qn. However, in adaptive sorting, 
no-one tells us the number of inversions. On the other hand, suppose we can 
generate a sequence of qi with qo = 1, and such that S{n, qi) = 0{nT‘). Then we 
can run our algorithm for up to nqi inversions for i = 0, 1, ... until we successfully 
produce a sorted list. Even if the number qn of inversions is unknown, we get a 
running time of 



X] 0{S{n, q^)) = 0(5(n, q)) 

i=0A,...,qi-i<q 

For example, with the function S{n, q) = 0(n(log log n)/(l -flog log n— log log g)) 

in Theorem 2, we can take qi = . The time we have to compute qi is 

0{S{n,qi)), which is far more than we need. 

Very generally, when we have a function like S{n, •) where can find a sequence 
of arguments that asymptotically doubles the value, we call the function double- 
able. Here, by finding the sequence, we mean that we can construct the relevant 
parts of it without affecting our overall time and space consumption. 

3 Equivalence Between Inversion Sorting and 
Multisorting 

In this section we show Theorem 1. 

Lemma 1. For any integer r > 1 we can reduce, with an additive 0{n) over- 
head in time, inversion sorting of n integer keys with at most qn inversions to 
performing the following tasks: 

(i) Multisorting n integer keys in groups with at most r integer keys in each, 
followed by 

(ii) Inversion sorting Aqn/r integer keys with an upper bound qn on the number 
of inversions 

Proof. The following algorithm will solve the problem as described in the lemma. 

1. Divide the n keys into |"n/r] groups with r keys in each group, except pos- 
sibly the last. Denote the groups Gi, • • • , G|-„/r] • For ease of exposition let 
G|-„/r]-i-i be an empty group. 

2. Multisort the n keys in the groups of size at most r, and let Gi[l],Gi\fZ], . . . 
be the sorted order of the keys in group i. 

3. Initialize |"n/r] -|- 1 empty lists denoted S\, and U\,. . . , U^n/r] ■ 
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4. Merge all the groups by looking at two groups at a time. For i = 1,..., \n/r\ 
add the keys in groups Gi and G^+i, to S\, Ui, or t/i+i one at a time as 
described below. We say that a key in a group Gi is unused if it has not 
been added to any of the lists Si or Ui. 

While there are any unused keys in Gi and no more than r/2 used keys in 
Gi+i we repeatedly find the minimum unused key. In case of a tie the key 
from Gi is considered the minimum. The minimum is inserted in Si if the 
previous key inserted in Si is not larger, and otherwise it is inserted in Ui or 
Ui+i depending on the group it came from. Any remaining keys in Gi after 
this are inserted in Ui. 

5. Let U be the concatenation of the lists Gi, . . . , U\^n/r']- 

6. Inversion sort the list U with qn as an upper bound on the number of inver- 
sions. Denote the resulting sorted list S 2 - 

7. Merge the two sorted lists Si and S 2 . 

It is easy to see that the two merging stages, in steps 4 and 7 of the algorithm, 
take linear time and that the multisorting in step 2 is as stated in the lemma. 
Since the ordering of the keys in U is the same as in the original sequence the 
number of inversions in U is bounded by qn. 

What remains to show is that the list U has size at most ^qnjr. We argue 
that in all cases in step 4 where a key is added to the list Ui or Ui+i, this key 
is involved in at least r/2 inversions. First of all, the minimum key can only be 
smaller than the last key inserted in when we have just increased i, and it 
must come from G^+i. But then it must have at least r/2 inversions with unused 
keys in Gi. Secondly, whenever we put remaining keys from Gi in Ui, each of 
these has at least r/2 inversions with used keys from Gi+i. Since each key in U 
is involved in at least r/2 inversions and there are at most qn inversions in total, 
and since each inversion involves two keys, we have an upper bound on the size 
of U of Aqn/r. 

Corollary 2. On a RAM, inversion sorting of n integer keys, given an upper 
bound qn on the number of inversions, can be reduced to multisorting n integer 
keys in groups of size at most q, using 0{n) time and space. 

Proof. Using Lemma 1 with r = q log n we get that the time for inversion sorting 
is 0{n) plus the time for multisorting n keys in groups of size at most qlogn and 
inversion sorting 4n/ log n keys with at most qn inversions. Sorting 4n/ log n keys 
can be done in time 0(n) by any optimal comparison-based sorting algorithm. 

Multisorting n keys in groups of size at most q log n can be done by mul- 
tisorting the n keys in groups of size at most q and merging the groups into 
larger ones in 0(n) time. A group of size at most qlogn is divided up into at 
most logn subgroups of size at most q. To merge these sorted subgroups into 
one sorted group, start by inserting the minimum from each subgroup into an 
atomic heap [7]. Repeatedly remove the minimum from the heap and let it be 
the next key in the merged group. If the deleted minimum comes from subgroup 
i, then insert the next key, in sorted order, from subgroup i into the atomic heap. 
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Insertion, deletion, and findmin in an atomic heap with at most log n keys takes 
0(1) time, after 0(n) preprocessing time [7]. The same atomic heap can be used 
for all mergings and the total time to merge all groups is hence 0{n). 

Lemma 2. On a RAM with word size w, multisorting n w-bit integer keys in 
groups of size at most q, can he done by inversion sorting n w-bit integer keys 
with an upper bound of qn inversions plus 0{n) extra time and space. 

Proof. The following algorithm will solve the problem as described in the lemma. 
Assume that the word length w is at least 2 [log n] . If it is smaller then the sorting 
can be done in linear time using radix sort. 

1. Sort all the keys with respect to the 2 [log n] least significant bits, using radix 
sort. Keep a reference to the original position as extra information. 

2. Distribute the keys in the original groups, using the references, such that 
each group now is sorted according to the 2[logn] least significant bits. 

3. For each key x, construct a new rc-bit word dx, where the first [logn] bits 
contain the number of the group that the key comes from. The following 
w — 2[logn] bits contain the w — 2[logn] most significant bits in x, and 
the last [log n] bits contain x’s rank after the sorting and re-distribution in 
steps 1 and 2. 

4. Inversion sort the constructed words from step 3 with qn as an upper bound 
on the number of inversions. 

5. Construct the multisorted list by scanning the list from step 4 and using the 
rank (the last [log n~\ bits) to find the original key. 

Steps 1, 2, 3 and 5 can clearly be implemented to run in 0(n) time, and 
provided that qn is an upper bound on the number of inversions for the sorting 
in step 4 the time for the algorithm is as stated in the lemma. To see that qn is an 
upper bound on the number of inversions, note that since the most significant 
bits in the words sorted in step 4 encode the group number, there can be no 
inversions between words in different groups. Since a group contains at most q 
keys there are at most q inversions per word, and qn is an upper bound on the 
total number of inversions. 

What remains to show is that the algorithm is correct. For two keys x and 
y in the same group and x < y, it holds that the words dx and dy constructed 
in step 3 have the property dx < dy. This together with the fact that the group 
number is written in the most significant bits in the words sorted in step 4 gives 
that the list produced by the last step is the input list multisorted as desired. 

Theorem 1 follows from Corollary 2 and Lemma 2. 

4 Fast Multisorting on a RAM 

Multisorting can of course be done by sorting one group at a time. In particular, 
by [9,10] multisorting can be done in linear space and 0{ny/log logg) expected 
time or deterministic time O(nloglogq). 
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If log (7 < for some £ > 0, we can use the linear expected time sorting 

of Andersson et al. [2] on each group. Since w > log n, this gives linear expected 
time sorting for q < In this section we show how to exploit the 

situation of sorting multiple groups to increase the group size to which 

is much closer to n. This gives linear time inversion sorting algorithms for a wide 
range of parameters. 

For our fast randomized sorting algorithm we use the following slight variant 
from [10] of the signature sorting of Andersson et al. [2]. 

Lemma 3. With an expected linear-time additive overhead, signature sorting 
with parameter r reduces the problem of sorting q integer keys of i bits to 

(i) the problem of sorting q reduced integer keys of 4r log q bits each and 

(ii) the problem of sorting q integer Helds oflfr bits each. 

Here (i) has to be solved before (ii). 

Employing ideas from [10], we use the above lemma to prove 

Lemma 4. Consider multisorting n integer keys in groups of size at most q. 
In linear expected time and linear space, we can reduce the length of keys by a 
factor (logn)/(logg). 

Proof. To each of the groups Si of size at most q, we apply the signature sort 
from Lemma 3 with r = (log n)/(log g). Then the reduced keys from (i) have 
dr log q = 0 (log n) bits. 

We are going to sort the reduced keys from all the groups together, but 
prefixing keys from Si with i. The prefixed keys still have O(logn) bits, so we 
can radix sort them in linear time. From the resulting sorted list we can trivially 
extract the sorted sublist for each Si. 

We have now spent linear expected time on reducing the problem to dealing 
with the fields from Lemma 3 (ii) and their length is only a fraction 1/r = 
(log g)/ (log n) of those in the input. 

We are now ready to show Theorem 2. 

Proof of Theorem 2: We want to multisort n keys in groups of size at 

most g, where each key has length w. We are going to reduce the length 
of keys using Lemma 4. Using a result of Albers and Hagerup [1], we 
can sort each group in linear time, when the length of keys is reduced to 
w / (log g log log g) . Consequently, the number of times we need to apply Lemma 4 
is O(log(iog„)/(iog,)(loggloglogg)) = O ((log log g)/ (log log n - log log g)). Since 
each application takes linear expected time, this gives an expected sorting bound 
of 0(n]"(loglogg)/(loglogn — log log g)]). This expression simplifies to the one 
stated in the theorem. For g = 2*^'°®"^ * the time for the algorithm is 0{n/e). 

□ 

Comparing the expression of Theorem 2 with the 0{n^log log g) bound ob- 
tained from [10] on each group, we see that Theorem 2 is better if (log log n — 
log log g) = w(Vlog logg), that is, if g = 2 ^°®”)^ ^(i/viog lognj ^ 
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5 More General Measures of Disorder on a RAM 

This section aims at obtaining a RAM sorting algorithm that performs well both 
in terms of the number of inversions and the number of runs. This is achieved by 
bounding the time complexity in terms of the more general Osc disorder measure 
of Levcopoulos and Petersson [11]. 

For two integers a,b and a multiset of keys S define osc{a,b, S) = \{x € 
S I min(a, b) < x < max(a, 6)}|. For a sequence of keys X = xi, . . . , x„ define 



Clearly, Osc(X) < ii? . From [11], we know that Osc{X) is at most 4 times 
larger than the number of inversions in X and at most 2n times larger than 
the number of runs. Hence, an algorithm that is efficient in terms of Osc{X) 
is simultaneously efficient in terms of both the number of inversions and the 
number of runs. 

In comparison-based models the best possible time complexity in terms of 
Osc{X) is 0(nlog(Osc(A)/n)). In this section we show that we can replace the 
logarithms by the time complexity t(n) of a priority queue, which in turn is the 
per key cost of non-adaptive sorting of up to n keys [14]. More precisely. 

Theorem 3. Assume that we can sort up to q integer keys in linear space and 
time 0{q ■ t{q)) where t is double-able (c.f. ^2). Then we can sort a sequence X 
of n keys in time 



In particular, if t is convex, this is 0{n ■ t{0 sc{X) / n)) . 

The rest of this section proves Theorem 3. First we will show how to imple- 
ment a priority queue with a certain property, and then show how to use this 
for Osc adaptive sorting. 

5.1 A Last-In-Fast-Out Priority Queue 

In this section we describe a priority queue with the property that it is faster 
to delete keys that were recently inserted into the priority queue than to delete 
keys that have been in the priority queue for long. The time to delete a key x will 
depend on the number of inserts after key x was inserted. We call this property 
of a priority queue, which was previously achieved for the cache-oblivious model 
in [5], last-in-fast-out. 

We define a Delete-lnsert(a;, y) operation on a priority queue as a delete oper- 
ation on key x followed by an insertion of key y, where there may be any number 
of operations in between. For a key x in a priority queue we denote by inserts(x) 
the number of keys inserted in the priority queue after x is inserted and before 
X is deleted from the priority queue. 



n—1 



Osc{X) = ''^^osc{xi,Xi+i,X) . 
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Theorem 4. Assume that we can sort up to n integer keys in linear space and 
time 0{n-t{n)) where t is double-able (c.f. ^2). Then there exists a last-in- fast- out 
priority queue, with capacity for holding at most n keys at any time, supporting 
the operations lnit() in 0(1) time, FindMin() in time 0(1), lnsert(a;) in time 
0{t{n)), and Delete-lnsert(a;, y) in amortized time 0{t(inserts{x))) , using linear 
space. 

Proof. According to the reduction in [14] the sorting bound assumed in the 
theorem means that there is a linear space priority queue supporting FindMin() 
in time 0(1), Delete(a;) in time 0(t(n)), and lnsert(a;) in time 0(t(n)), where n 
is the number of keys in the priority queue. 

Our data structure consists of a number of priority queues, denoted by 
PQi,PQ2, ■ ■ ■ with properties as described above. We let Ui denote an upper 
bound on the size of PQi, where satisfies |’log(f(rii))] = i. Since t(n) is double- 
able we can compute in time and space 0(ni). The keys in each priority queue 
PQi are also stored in a list, Li, ordered according to the time when the keys 
were inserted into PQi, i.e., the first key in the list is the most recently inserted 
key and the last is the first inserted key. There are links between the keys in PQi 
and Li. Each key in the data structure is stored in one of the priority queues, 
and has a reference to its host. The minimum from each of the priority queues is 
stored in an atomic heap together with a reference to the priority queue it comes 
from. The atomic heap supports operations lnsert(a;), Delete(a;), FindMin(), and 
Find (a;) in constant time as long as the number of priority queues is limited to 
log n [7] . There will never be a need for an atomic heap of size greater than log n. 
This is due to the definition of Ui and because, as we will see in the description of 
lnsert(a;), a new priority queue will only be created if all smaller priority queues 
are full. 

Init() is performed by creating an empty priority queue PQi and an empty 
atomic heap prepared for a constant number of keys. This can clearly be done 
in 0(1) time. Note that as the data structure grows we will need a larger atomic 
heap. Initializing an atomic heap for log n keys takes 0(n) time and space, which 
will not be too expensive, and it is done by standard techniques for increasing 
the size of a data structure. 

FindMin() is implemented by a FindMin() operation in the atomic heap, re- 
turning a reference to the priority queue from which the minimum comes, fol- 
lowed by a FindMin() in that priority queue, returning a reference to the mini- 
mum key. FindMin() can be performed in 0(1) time. 

Delete-lnsert(a;, y) is simply implemented as a Delete(x) followed by a In- 
sert(y), hence we do not care about the pairing in the algorithm, only in the 
analysis. 

Delete(a;) is implemented as a Delete(a;) in the priority queue, PQi, in which 
the key resides, and a Find(a;) in the atomic heap. If the key is in the atomic 
heap, then it means that x is the minimum in PQi. In this case x is deleted from 
the atomic heap. The new minimum in PQi (if any) is found by a FindMin() in 
PQi and this key is inserted into the atomic heap. The time for the operations on 



On Adaptive Integer Sorting 565 



the atomic heap is constant. Delete(a;) and FindMin() in PQi takes time 0{t{rii)) 
and 0(1) respectively. We will later argue that t{rn) = 0{t{inserts{x)). 

When an Insert(xi) operation is performed the key x\ is inserted into PQi 
and into Li as the first key. This may however make PQi overfull, i.e., the num- 
ber of keys in PQ\ exceeds ni. If this is the case, the last key in Li, denoted X 2 , 
is removed from PQi and inserted into PQ 2 - If the insertion of x\ or the deletion 
of X 2 changes the minimum in PQi then the old minimum is deleted from the 
atomic heap and the new minimum in PQi is inserted into it. The procedure 
is repeated for i = 2, 3, . . . where Xi is removed from PQi_i and inserted into 
PQi- We choose Xi as the last key in Li_i. This is repeated until PQi is not full 
or there are no more priority queues. In the latter case a new priority queue is 
created where the key can be inserted. Assume that an insertion ends in PQi. 
The time for the deletions and insertions in PQi, . . . ,PQi is 
Since the sizes of the priority queues grows in such a way that the time for an 
insertion roughly doubles from PQj to PQj+i this sum is bounded by 0{t{rii)). 
All other operations during an insert takes less time. In the worst case we do 
not find a priority queue that is not full until we reach the last one. Hence, the 
time for insert is 0{t{n)). 

Now it is easy to see that if key x is stored in PQi then there must have 
been at least rii-i inserts after x was inserted. The bound for the delete part of 
Delete- 1 nsert(a;, y) follows since t(ni_i) = 0{t{rii)). 

To show that the insert part of a Delete- 1 nsert(a;, y) operation only increases 
the time by a constant factor we use a potential argument. The delete operation 
will pay for the insert. For the sake of argument say that whenever a key x is 
deleted from PQi the delete operation pays the double amount of time compared 
to what it uses and leaves 0{t{rii)) time for the insertion of a key in its place. 
Whenever a key y is inserted into the data structure, the first non-full priority 
queue PQi is the place where the insertion stops. This insertion is paid for by the 
saved potential, unless the insertion does not fill the place of a deleted key. This 
only happens so many times as the number of lnsert(a;) operations not coupled 
with a deletion. Hence, the total extra time is 0(t(n)) for each such insertion. 
The amortized bound on Delete-lnsert(x, y) follows. 

The space usage of the data structure is clearly bounded by 0(n). 

5.2 Our General Adaptive Sorting Algorithm 

Theorem 5. Suppose that we have a linear space last-in- fast- out priority queue 
for integer keys, with amortized time complexity 0{t(n)) for insertion and 
0{t{inserts{x))) for deleting the minimum x followed by an insertion. Also, sup- 
pose t is non- decreasing and double-able, and that t(n) = O(logn). Then there is 
an integer sorting algorithm that uses linear space and on input X = Xi, . . . ,Xn 
uses time 



In particular, if t is convex, this is 0{n ■ t{0 sc{X) / n)) . 
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Proof. Initially we split the input into 0(n/ log n) groups of 0(logn) keys, and 
sort each group in linear time using an atomic heap. Call the resulting list X' . 
By convexity of t, and since t{n) = O(logn), the sum 

is larger than the sum in the theorem by at most 0(n). To sort X' we first 
put the minimum of each group into the last-in-fast-out priority queue in 0(n) 
time. We then repeatedly remove the minimum from the priority queue, and 
insert the next key from the same group, if any. The total time for reporting 
the first key from each group is clearly 0(n). Insertion of a key x\ that is not 
first in a group is amortized for free and the deletion cost is 0{t{inserts{x'f))) = 
0{t{osc{x'.i_^,x[,X'))). Summing over all keys we get the desired sum. 

Theorems 4 and 5 together imply Theorem 3. As an application, using the 
0{n^\og logn) expected time sorting from [10], we get that a sequence X oi n 
keys can be sorted in expected time 0{n^\og\og{Osc{X)/n)). 

6 Final Remarks 

An anonymous reviewer has pointed out an alternative proof of Theorem 1. 
The main insight is that for a sequence of n keys with qn inversions, there is a 
comparison-based linear time algorithm that splits the keys into a sequence of 
groups of size at most q such that no key in a group is smaller than any key in 
a previous group. In other words, all that remains to sort the entire key set is 
to multisort the groups. 

The groups can be maintained incrementally, inserting keys in the order they 
appear in the input. Initially we have one group consisting of the first q keys. We 
maintain the invariant that all groups contain between q/2 and q keys. For each 
group, the minimum key and the group size is maintained. The group of a key x 
is found by linear search for the largest minimum key no larger than x. The total 
time for these searches is 0{n) since each search step can be associated with at 
least q/2 inversions. If the size of the group in which x is to be inserted is q, we 
split the group (including x) into two groups, using a linear time median finding 
algorithm. The time for these splittings is constant amortized per inserted key. 
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Abstract. We fix two rectangles with integer dimensions. We give a quadratic time 
algorithm which, given a polygon F as input, produces a tiling of F with translated 
copies of our rectangles (or indicates that there is no tiling). Moreover, we prove 
that any pair of tilings can be linked by a sequence of local transformations of 
tilings, called flips. This study is based on the use of J. H. Conway’s tiling groups 
and extends the results of C. Kenyon and R. Kenyon (limited to the case when 
each rectangle has a side of length 1). 



1 Introduction 

In 1990, J. H. Conway and J. C. Lagarias [3] introduce the notion of tiling group. They 
create a way to study tiling problems with an algebraic approach. This work has been 
prolonged by W. P. Thurston [16], who introduces the notion of height function (also 
introduced independently in the statistical physics literature (see [2] for a review)). The 
tools cited above have been used and generalized by different authors [5], [6], [7], [9], 
[12], [13], [14], [15] and some very strong algorithmic and structural results about tilings 
have been obtained. 

In this paper, we consider the problem of tiling a rectilinear polygon (i. e. a not 
necessarily convex (but simply connected) polygon whose sides consist of horizontal 
and vertical lines) with rectangular tiles. This problem was previously considered by C. 
Kenyon and R. Kenyon [5]. These authors obtained a linear time algorithm for tiling a 
rectilinear polygon by translates of m x 1 and 1 x n' rectangles, m and n' denoting 
positive fixed integers. For this, they used Conway-Lagarias type tiling groups, finding 
an appropriate generalization of “height function” as used by W. P. Thurston for their 
groups. They also considered tiling using 2x3 and 3x2 rectangles, where it was asserted 
that a quadratic time algorithm exists (a brief sketch of an approach without full details 
is given in the paper). It is also claimed that the method can be extended to the case of 
tilings by fc X / and I x k rectangles, for any pair {k, 1) of positive integers. 

The present paper generalizes the previous paper, considering tiling by two types of 
tiles, which are translates (we do not allow rotations) ofmxn and m' x n' rectangles, 
for an arbitrary fixed sequence (m, m' , n, n') of positive integers. Our analysis also uses 
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Conway-Lagarias type tiling groups. The key new idea needed for the general case is 
in Proposition 2: we introduce a new technique to reduce the problem by cutting the 
polygon, using algebraic arguments. 

The main results of the paper are: 

- the. flip connectivity (Theorem 1): any pair of tilings of a same polygon can be linked 
by a sequence of local transformations of tilings, called flips, 

- a decision algorithm for tileability of a (rectilinear) polygon P that is quadratic in 
the area {A{P)) (Theorem 2). 

An important restriction in this work is the fact that tiled regions are assumed no 
containing interior holes. We recall that it is NP-complete to decide if a general connected 
region (with holes allowed) is tileable by translates of m x 1 and 1 x n' rectangles, with 
m and n' being integers such that m > 3 and n' > 2[1]. 

The paper is divided as follows: we first give the definitions and basic concepts 
(section 2). Afterwards, we explain our new techniques to cut tilings (section 3) and we 
show how to combine them with the machinery of height functions (section 4) to get a 
quadratic tiling algorithm and prove the flip connecfivity. 



2 Basic Tools 

The Square Lattice. Let A be the square lattice of the Euclidean plane A celi of 
A is a square of unit sides whose vertices haves integer coordinates. The vertices and 
edges of A are those of the cells of A. Two vertices are neighbors if they are linked by 
an edge. 

A path P of A is a finite sequence of vertices such that two consecutive vertices are 
neighbors. A corner of a path P is the origin or final vertex, or a vertex u of P which 
is not equal to the center of the line segment linking the predecessor and the successor 
of u in P. A corner v is weak if its predecessor and its successor are equal, otherwise 
the corner is strong (the origin and final vertices of the path are strong corners). A path 
can be defined by the sequence {vq, ui , . . . , Vp) of its corners. In this paper, we encode 
paths by this way. A move of a path is the subpath linking two consecutive corners. We 
have two kinds of moves: the horizontal and vertical ones. The algebraic length of a 
horizontal (respectively vertical) move from Vi = {xi, Ui) to is the 

difference Xi+i — Xi (respectively yi+i — yi). 

A (finite) figure P of A is a (finite) union of (closed) cells of A. Its boundary is 
denoted by 5F. The lower boundary is the union of open horizontal line segments ]v^v'[ 
included in 5F, such that the interior of P is above ]v^v'[ (i. e. more formally, P contains 
a rectangle whose bottom side is [u, w']). On defines in a similar way the left, right, and 
upper boundaries of P. A hgure P is simply connected if P and its complement \ P 
both are connected. A finite simply connected figure P is called a polygon of A. The 
boundary of a polygon P canonically induces a cycle in A, which is called the boundary 
cycle of P. In this paper we only consider polygons, except when explicitly stated 
otherwise. 
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Tilings. We fix a pair of {i?, R'} of rectangles with vertical and horizontal sides and 
integer dimensions. The width (i. e. horizontal dimension) of R is denoted by m , the 
width of R' is denoted by m'; the height (i. e. vertical dimension) of R is denoted by n, 
the height of R' is denoted by n'. An R-tile (respectively R' -tile) is a translated copy of 
the rectangle R (respectively R'). A tiling T of a figure F is a set of tiles, with pairwise 
disjoint interiors (i. e. there is no overlap), such that the union of the tiles of T equals 
F (i. e. there is no gap). A tile t is cut by a path P if there exists an open line segment 
]n, n' [ linking two consecutive vertices of P, included in the interior part of the tile t. The 
subgraph Gt = {Vr, Ft) of A such that Vt (respectively Ex) is the set (respectively 
edges) of vertices of A which are on the boundary of a tile of T. Obviously, the set Vt 
(respectively Et) contains all the vertices (respectively edges) of A which are in 5E. 

The Preprocessing. The problem of tiling can easily be reduced to the case when 
gcd{m,m!) = gcd{n,n') = 1 as follows; consider the only tiling 0 of the figure E 
with translated copies of a gcd{m, m!) x gcd{n, n') rectangles, called bricks. The tiling 
6> is a refinement of any "classical" tiling T with i?-tiles and i?'-tiles, i. e. each brick of 
0 is included in a tile of T. 

Two bricks 9 and O' of 0 are called equivalent if they respectively are the first and last 
elements of a finite sequence of bricks such that two consecutive bricks of the sequence 
share a whole side. Remark that two bricks included in a same tile of T are equivalent 
(i. e. the boundary of an equivalence class cuts no tile of T), and the union of all the 
bricks of an equivalence class form a polygon (if such a union had a hole, then a brick 
placed on a corner of the hole would create a contradiction). Thus, the problem can be 
simplified as follows: We first compute the equivalence classes (this can easily be done 
in a linear time in the area of the polygon). Afterwards, we have to study tilings on each 
equivalence class. But, since in each class, bricks are equivalent, we can change the units 
in order to consider a brick as the new unit rectangle. 

This transformation makes us consider that gcd(m, m') = gcd{n,n') = 1. In the rest 
of the paper, we will assume that this simplification has been done and thus gcd{m, ml) = 
gcd{n, n') = 1. 

3 Tiling Groups and Cut Lines 

The two following sections are devoted to the main case, called the non-degenerate case, 
when we have m > 1, n > 1, m' > 1 and n' > 1. 

We introduce the groups G\ = Z/mZ * Z/n'Z, i. e. the free product (see for 
example [8] for details about group theory) of groups Z/mZ and Z/n'Z), and G 2 = 
Z/to'Z * Z/nZ. We only give properties and notations for G\, since they are symmetric 
for G2. We recall that there exists a canonical surjection (denoted by si) from G\ to the 
Cartesian product Z/mZ x Z/n'Z, and that each element g of Gi can be written in a 
unique way: g = n^^igi, with the following properties, 

- for each integer i such that 1 < i < p, is element of the set {’Ll mid)* U (Z/n'Z)*, 

- for each integer i such that 1 < i < p, Pi is element of Z/mZ if and only if is 

element of Z/n'Z. 
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Fig. 1. The graph of Gi (with m = 4 and n' = 3). 



This expression is called the canonical expression of g and the integer p is called the 
length of g (denoted by 1(g)). By this way, an order relation is created on Gp. we say 
that g < g' if the canonical expression of g is a prefix of g' . 

Two distinct elements g and g' of Gi are H -equivalent if there exists an element x 
of Z jmL such that g' = gx, they are V -equivalent if there exists an element y of Zj n'1 
such that g' = gy, they are neighbors if they are iJ-equivalent or 14-equivalent. Each 
element of Gi is exactly the intersection of its H -equivalence class and its V -equivalence 
class. 

The graph ofGi is the symmetric graph whose vertices are elements of Gi, and for 
each pair (g, g') of distinct elements of Gi, (g, g') is an edge if and only if g and g' are 
neighbors ^ 

Remark that each path of this graph linking an element g to an element g' has to pass 
through each element g” such that < g-^f < g-^g'- Informally, we can say that 
the structure of graphs elements is an infinite tree of equivalent classes. 

Tiling Function. Let P = (ui, . . . , Vp) (according to our convention, the vertices of 
the sequence are the corners of P). This path canonically induces an element gp of Gi 
as follows: each horizontal (respectively vertical) move gives an element of 'LjraL (re- 
spectively Z/n'Z ) equal to its algebraic length; the element gp is obtained successively 
making the product of elements given by successive moves. Notice that any path de- 
scribing the contour of a tile induces the unit element . Thus, tilings can be encoded 
using the groups Gi and Gi, as we see below. 

Definition 1. (tiling function) Let T be a tiling of a polygon F. A tiling function induced 
by T is a mapping fi p from the set Vp to G\ such that, for each edge [v = (x,y),v' = 
(x', y')] cutting no tile ofT: 

— ifx = x' , then fixiv') = fi.T(v)(y' — y), withy' — y seen as an element of 12 / 111 ' Tj, 

— ify = y',thenfi^T(v') = f\^T(v)(x' — x), with x' — x seen as anelement ofL/mL. 



Proposition 1. (J. H. Conway) let F be a figure, T be a tiling of F, vq be a vertex of the 
boundary of F and go be a vertex of G\. If F is connected (respectively is a polygon), 

* In the language of group theory, the graph of elements is the Cayley graph of Gi when the set 
of generators is (Z/mZ)* U (Z/n'Z)* and labels have been removed. 
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then there exists at most one (respectively exactly one) tiling function fi^T induced by 
T such that fi^rivo) = go. 

We will now study tilings of a polygon F. For each tiling T, we use the unique tiling 
function /i t such that /i.t( 0) is the unit element of Gi, where O denotes a fixed lower 
left corner of the boundary of F. Remark that, for each vertex v on 5F, fi.riv) does not 
depend on the tiling T (which allows to remove the tiling index and write fi{v)) and 
can be computed without the knowledge of T, following the boundary of F. We also 
note that, for each vertex v of Vt, the value si o /i.t('u) is the pair of coordinates of v 
(respectively modulo m and modulo n'), and, consequently, does not depend on T. 

Cutting Using Algebraic Constraints 

Definition 2. A special path P of F is a a path defined by a sequence (wi, Vp) of 
successive comers such that: 

— the vertex v\ is on the lower boundary of F, the vertex Vp is on the boundary of F, 
and for each integer i such that 1 <i < p, the line segment [vi, u^+i] is included in 
F, 

— for each odd integer i such that 1 < i < p, there exists a non-null integer ki such 
that —m! < ki < m' and Ui+i = vt -\- {kitn, 0), 

— for each even integer i such that 1 < i < p, there exists a non-null integer ki such 
that —n < ki < n such that Ui+i = Vi (0, kiu'), 

— the element (/ 2 (uo))~^/ 2 (t'p) is the element gp 0 /G 2 induced by the path P, i. e. 
nf~^riki, with ri = n' for i even and Vi = mfor i odd. 

The notion of special path with the proposition below form the main new idea of this 
paper. There exists a easier proof [4] in the particular case when m' = n = 2 since the 
graph of G 2 has a linear structure. 

Proposition 2. Let T be a tiling and P be a special path of a polygon F. The path P 
cuts no tile ofT. 

Proof. We first construct from P another path Pp as follows: for each tile t which is 
cut by P, let Pt, cross denote the sub-path of P which crosses t. The path Pp is obtained 
replacing in P each sub-path Pt, cross by the shortest path Pt, surround, on the boundary 
of t, with the same endpoints as Pt, cross (if two paths are possible, any convention is 
possible; one can state, for example, that t is surrounded by left). Notice that Pt, surround 
is formed from two or three moves. Informally, The path Pp is an approximation of P 
which cuts no tile of T. 

Let i be an integer such that 1 < i < p. We define the vertex u' such that, is 
the last strong corner of Pp such that f 2 ,T{v'i) is equal to the element gp. induced by 
the sub-path of P from vq to Vi (i. e. = Uj^Qrjrij (with rj = n' for i even and 
Tj = m for j odd)). This corner exists because of the structural properties of the graphs 
of elements and classes of G 2 (especially the tree structure of the graph of classes). The 
point v" is defined as the last point reached in Pp, before u' which also is in P (possibly 
v" = v[ ; see figure 2 for the notations). We claim the following fact: 
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Fig. 2. The main argument used in the proof of Proposition 2. The critical length has to be equal 
(modulo m') to the length of the line segment [vi, fi+i] and, on the other hand, this length is the 
sum of widths of tiles. 



Fact: The vertex Vi is before v” in P and, if, moreover, Vi = t;' then no tile of T is 
cut by P before this vertex. 

The above fact is proved by induction. The result is obviously true for i = 1. Now, 
assume that it also holds for a fixed integer i such that 1 < i < p. 

If is before v" on P, the result is also obvious, since v” is before Moreover, 
remark that, for each vertex v, S 2 o f 2 .T{v'j) is equal to the pair of coordinates of v 
(respectively modulo m' and n), thus S 2 o f 2 ,T{vi) S 2 ° f 2 ,T{vi+i), because of the 

first component. On theother hand we haves 2 o/ 2 ,T('t’i) = S 2 o/ 2 _T(t'z)-Thusu' ^ Vi+i, 

which yields that Vi+\ ^ Ui+i)- 

Otherwise, by induction hypothesis, the point v” is on the line segment [vi, Ui+i]. 
We only treat the case when i is odd and ki is positive, since the proof is similar in the 
other cases. 

Since u' is a strong corner of Pt, u' is on the left side of a tile ti used to define 
Pt- The vertex v” is also on the left side of t\, thus they have have the same horizontal 
coordinate. Assume that '^i+1 is before Vi+\ in P. In this case, ^ right vertex of 

a tile of T, and the vertical line passing through cuts the line segment \vi, in 
a vertex 

Now consider the line segment [v” , This segment is not reduced to a vertex 
(since v[ and have not the same horizontal component modulo m'). Moreover, there 
exists a sequence {ti,t 2 , ■ ■ ■ ,tq) of tiles of T such that the left side of ti contains u' 
and v”, the right side of tq contains ^rid u'" and for each integer j such that 
0 < j < the intersection of the right side of ti and the left side of ti+i is a line 
segment not reduced to a single point. Let p (respectively /i') denote the number of 
i?-tiles (respectively i?'-tiles) on the sequence {ti,t 2 , ■ ■ ■ , tq). We have: 

pm + p'm' < kim (1) 

and, moreover, from the definition of v" and u'" using the horizontal coordinates: 

pm + p'm' = kim{modulo[m']) (2) 

Thus, we necessarily have {p, p') = (ki, 0) (since gcd(m, m') = 1, the equality (2) 
gives that p = ki + km' with fc S Z; the integer k is null since 0 < p < m' from the 
inequality (1); thus p' = 0) 
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This means that v” = Vi, = Vi+i and the line segment [vi^ t'i+i] only cuts R- 
tiles. Thus v[ and Vi both are on the left side of the i?-tile fi which yields that v!^ = Vi since 
their vertical coordinates are equal modulo n. With the same argument, we necessarily 
have = Vi+i. Thus we have 

f2,T(vi+i) = f2.Tivi){kim) (3) 

Now, consider the path starting in Vi by a horizontal move which follows the boundary 
of the polygon formed by the ki i?-tiles of the sequence, until Vi+\ is reached. This path 
directly induces a word, without possible simplification in G 2 (since the (absolute) length 
of each vertical move is lower than n, and the length of each horizontal move is km with 
k integer such that 0 < k < ki < m'). This word is necessarily kim, from the equality 
3. This means that no tile is cut by the segment This finishes the proof of the 

fact above, using induction. 

Now, to get the result, it suffices to apply the fact for i = p. We necessarily have: 
Vp = Vp since Vp finishes the path, thus the whole path cuts no tile. 

Application to Stairs of Blocks 

Definition 3. A horizontal block (re5/7echve/y vertical block) o/T" is a (closed) mm' xn' 
rectangle (respectively m x nn' rectangle). 

A horizontal ( respectively vertical ) block B is covered by a block B' if the ( open ) top 
(respectively right) side of B and the (open) bottom (respectively left) side of B' share 
a line segment of length km, where k denotes a positive integer. 

A stair of blocks (see figure 3) is a finite sequence (i?i, B 2 , . . . , Bp) of blocks of the 
same type such that, for each integer i such that 1 < i < p, covers Bi, and there 
exists no subsequence (Bi+i, Si+ 2 , • • ■ , such that mm' x nn' 

rectangle. 

Notice that the figure formed by the union of blocks of a fixed stair has exactly one 
tiling which contains a unique type of tiles. 

Proposition 3. Assume that there exists a stair {B\, B 2 , . . . , Bp) of horizontal blocks 
such that the intersection of B\ with the lower boundary of F contains a horizontal line 
segment S of length at least m, the intersection of Bp with the upper boundary of F 
contains a horizontal line segment S' of length at least m, and the tiles of the tiling of 
can be placed in F without inducing contradiction in the definition of the tiling 
function. 

Each tiling T of F contains the tiling ofU^^iBi. 

Proof. From the hypothesis, there exists a vertex r; of S' such that the distance between 
V and the lower right corner of Bi is km, for an integer k such that 0 < A: < m'. In 
a similar way, there exists a vertex v' of S' such that the distance between v' and the 
upper right corner of Bp is k'm, for an integer k' such that 0 < k' < m' . 

The path from v to v' following the boundary of counterclockwise is a special 

path and, from the previous proposition, no tile of T cuts this path. The same argument 
can be used following the boundary of \J^_^Bi clockwise. This gives that no tile of T is 
cut by the boundary of \Jl^^Bi. 

Of course, we have a similar proposition for stairs of vertical blocks. 
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Fig. 3. A stair of blocks which crosses the polygon and an associated special path. 



4 Height Function and Order 

We now use the same height function as in [5]. We extend it, defining the height function 
for tiles and tilings, in order to obtain an order on the set of tilings of the polygon. 

Definition 4. (height, predecessor, order on Gi) The height hi{g) of an element g of 
Gi = 'L/m'L * hjrtJ'L is defined by the following axioms : 

— if there exists an integer k such that the canonical expression of g is (m'n)^ or 
{m!n)^m' , then hi{g) = —l{g) (with mJ seen as an element of 'L/m'L and n seen 
as an element ofL/n'L), 

— for each element g of G\, there exists a unique element pr(g) (called the predecessor 
of g) equivalent to g, such that h\(pr(g)) = hi(g) — 1, 

- for each element g' ofG\, equivalent to both g and pr(g), we have: hi(g') = hi(g). 

- for each other neighbor g" of g, we have pr(g") = g. 

Given a pair (g, g') of elements ofG\, we say that g < g' if there exists a sequence 
(gi,g 2 , ■ ■ ■ , 9p) of elements ofG\ such that gi = g, Pp = g' and, for for each integeri 
such that 1 < i < p, gi is the predecessor of gi+\. 

Notice that the predecessor relation induces a tree structure on the group Gi. Also 
notice that the canonical expression of any element g can be written WiW 2 , where wi 
is a prefix word of maximal length such that wi = (m'n)^ or w\ = (m'n)^m' , for an 
integer /c. From this decomposition, hi(g) can be computed easily: hi(g) = l(w) if wi is 
theempty word, = —l{w) ifw 2 is the empty word, and hi (g) = l(w 2 ) — l{wi) — 1 
otherwise. 

Notice that the heights of vertices which are on the same side \v' , v”] of a tile form 
either a singleton {h} or a pair {h, h — 1}. With these notations, we say that h is the 
height of [v' , v"] in T (denoted by hi v"]). With the convention: /i,t(0) = 1gi, 
the height of horizontal sides is even, and the height of vertical sides is odd. Both vertical 
sides of an i?-tile t have the same height, say h and the height of each horizontal tile is 
either h — 1 or h + 1 (a symmetric situation holds for i?'-tiles). The height hi(t) of the 
tile t is defined by: h-i(T) = h (i. e. the height of an i?-tile (respectively i?'-tile) is the 
height of its vertical (respectively horizontal) sides. The height hi(T) of the tiling T is 
defined by: hi(T) = o(f)^i,T(f) where a(t) denotes the area of the tile t. 
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Local Flips. An mm' x {n + n') rectangle admits two tilings. Let T be a tiling of F 
such that an mm' x {n + n') rectangle is exactly covered hy tiles of T. The replacement 
of those tiles of T (which cover the mm' x (n + n') rectangle) hy tiles of the other tiling 
of the same rectangle is called a weak vertical flip (see figure 4). By this way, a new 
tiling Tfiip is obtained. 

The flip is said pushing up if ii'-tiles of T are in the bottom part of the mm' x (n+ n') 
rectangle (and i?'-tiles of Tjup are in the top part of the mm' x (n + n') rectangle). 
Otherwise, the flip is said pushing down. If the height of both horizontal sides of the 
mm' X (n + n') rectangle differ hy two units, then hflT) and hflTfUp) are not equal. 
Otherwise hi{T) = hi{TfUp) and the flip is said height stable. 

In a symmetric way, we have another type of weak flips, the horizontal flips which 
based on the two tilings of an ( m + m' ) x nn' rectangles . We use some similar definitions 
(pushing left flip, pushing right flip, height stable flip). 





: . . : R: . . : : 


■ ■ ;R''; ■ 


: . R ! . . . R ■: : : 


t 


■ ■ R ; 


; R' R' ; 




; ■ ■ ;r ; ■ ■ ; 




Fig. 4. A weak flip on the left, a strong flip on the right 



We have another type of flip (called strong flip) based on the fact that we have two 
ways to tile a mm' x nn' rectangle (see figure 4). Such a flip always changes the value 
of fli(T). 

Definition 5. (eqnivalence, order on the set of tilings) Two tilings are said height 
equivalent if one can pass from one to the other one by a sequence of height stable flips. 

Given a pair (T, T') of tilings, we say that T covers T' is there exists a pair 
{Taux, T^ux) of tilings of F such that T is equivalent to Taux, T' is equivalent to 
h\{Taux) > ^i(2^aua;)< Taux A deduced from T^ux by one flip. The order relation 
which is the transitive closure of the relation of covering is denoted by >. 



Critical Tiling. For the order defined above, there exists at least a minimal tiling. If a 
tiling is minimal, then all the height equivalent tilings are also minimal. 

Definition 6. A critical fl/mg is a minimal tiling such that neither height equivalent 

pushing down flip nor height equivalent pushing left flip can be done. 

To prove that there exists a critical tiling, it suffices to consider consider the potential 
function pot defined as follows: for each tiling T, pot{T) is the sum of the vertical 
coordinates of the horizontal sides of i?'-tiles of T and the horizontal coordinates of 
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the vertical sides of i?'-tiles. Notice that a pushing down (respectively left) flip removes 
2mn' (respectively 2m' n) units from the function pot. Thus, for any weak flip pot{T) 
and pot{Tfiip) are comparable. A minimal tiling which has the lowest potential value 
among the minimal tilings is necessarily critical. 

Proposition 4. Let Tern be a critical tiling. Let M denote the highest value ofhi^Tcrit 
in and M' denote the highest value in SF. 

We have : M' > M — Moreover, when M' = M — 1 and M is odd: 

— there exists a set of blocks with pairwise disjoint interiors (called critical blocks) 

whose tilings are included in Tern, far each vertex v, ^ if and only 

ifv is in the interior part of a line segment [v' , v"\ which is the common side of two 
R' -tiles of the same block. 

— each critical block of Tern whose bottom side does not share a line segment with 
SF covers another critical block of Tern 

— there is no set ofn critical blocks whose union is an mm' x nn' rectangle, 

— the value M' is reached on both lower and upper boundaries of F. 

We omit the proof. There exists a symmetric proposition for M even. 

Corollary 1. We keep the notations above. Assume that M' is even. For each vertex v 
of SF such that hi(v) = M' , at least one of the following alternatives holds: 

— the vertex v is on the boundary of an R-tile of Tern with two vertical sides of height 
M' - I, 

— the vertex v is on the boundary of the horizontal side of a critical block ofTcrit ■ More- 
over, ifv is on the upper boundary of F, then there exists a stair (Bi, i? 2 , • . • , Bp) 
of critical blocks such that the block B\ meets the lower boundary of F and v is on 
the top side of Bp. 

Proof. Obvious, since any other situation contradicts Proposition 4. 



Algorithm of Tiling. We present the algorithm below. 

Initialization: Construct fa, f 2 on SF (if a contradiction holds, there is no tiling). 
Main loop: Consider the set Sm' of vertices such that hi (v) = M'. We assume that 
M' is even (the case when M' is odd is treated in a symmetric way). If Sm' is included 
in the lower boundary of F, then place the i?-tile t in the neighborhood of a vertex v of 
Sm' in such a way that the maximal height reached in the boundary of t is M' (at most 
one possibility with no contradiction for fi j, and f^.r)- Otherwise, take a vertex v of 
Sm' in the upper boundary of F. If there exists a stair (Bi, B 2 , ■ . . , Bp) of blocks such 
that the block B\ meets the lower boundary of F, v is on the top side of Bp, and the stair 
creates no contradiction for fa r and / 2 ,t, then place the tiles of the tiling of the stair 
of blocks. If such a stair does not exists, then place the i?-tile t in the neighborhood of 
the vertex v in such a way that the maximal height reached in the boundary of t is M' 
(at most one possibility with no contradiction for fa x and / 2 ,t)- 

Remove the area tiled from F. Update M' and Sm' ■ Repeat the main loop until the 
remaining figure is empty or a contradiction appears about tiling functions. 
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Corollary 2. Assume that F can be tiled. The above algorithm gives a critical tiling. In 
particular, there exists a unique critical tiling Tcrit- 

Proof. FromProposition4, if 5 m' is included in the lower boundary of F, then M = M' , 
which guarantees the correctness of the placement of the tile in this alternative. On the 
other hand, Corollary 1 guarantees the correctness of the placement of tiles when Sm' 
contains a vertex of the upper boundary of F. 

Theorem 1. (connectivity) Let m, m! , n and n' be fixed integers, each of them being 
larger than 2. LeT (T, T') be a pair of tilings of a polygon F by translated copies of 
m X n and m! x n' rectangles. One can pass from T to T' by a sequence of local flips. 

Proof. Obvious, from Corollary 2. 

Theorem 2. (quadratic algorithm) Let m,m' ,n and n' be fixed integers, each of them 
being larger than 2. There exists a quadratic-time tiling algorithm (in the area A(^F) 
of F) which, given a polygon F, either produces a tiling of F by translated copies of 
m X n and m! x n' rectangles, or claims that there is no tiling (if it is the case). 

Proof, we have seen in Corollary 2 that the algorithm given above is a tiling algorithm. 
We just have to study its complexity. Each passage through the loop permits to place at 
least one tile, thus there are at most A{F) passages through the loop. 

Each passage through the loop costs 0{A{F)) time units. The key-point is that the 
search of a stair of blocks crossing the polygon is similar to the depth first search in a 
tree which has 0{A{F)) vertices. 

5 Related Problems and Extensions 

Case with a Side of Unit Length. Until this section, it has been assumed that all the 
dimensions of tiles are at least 2. When a tile has a side of unit length, at least one of 
the tiling groups is degenerate, i. e. reduced to a cyclic group. In such a case, the same 
approach can been done, with some simplifications due to degeneration. We also have a 
connectivity result and the theorem below (which contains the results of [5] about bars): 

Theorem 3. (linear algorithm) Let m, m' and n' be fixed positive integers. There exists 
a linear-time tiling algorithm (in the area A(^F) of F) which, given a polygon F, either 
produces a tiling of F with translated copies ofm x 1 and m' x n' rectangles, or claims 
that there is no tiling (if it is the case). 

Extensions. The same ideas can be used for rectangles with real dimensions. One can 
prove a connectivity result and give an algorithm (there is no standard framework to 
study complexity when we have real numbers). 

A motivating direction of research is a better knowledge of the structure of the space 
of tilings, i. e. the graph whose vertices are tilings of a polygon F and two tilings 
are linked by an edge if and only if they only differ by a single flip. Some very strong 
structural results (lattice structures, effective computations of of geodesics between fixed 
tilings) have previously be found in the special case when each tile has a side of unit 
width [14]. Can we extend these results to the main case ? 
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Abstract. We obtain the following results related to dynamic versions 

of the shortest-paths problem: 

(i) Reductions that show that the incremental and decremental single- 
source shortest-paths problems, for weighted directed or undirected 
graphs, are, in a strong sense, at least as hard as the static all-pairs 
shortest-paths problem. We also obtain slightly weaker results for 
the corresponding unweighted problems. 

(ii) A randomized fully-dynamic algorithm for the all-pairs shortest- 
paths problem in directed unweighted graphs with an amortized 
update time of O(mydi) and a worst case query time is 

(iii) A deterministic 0(n^ log n) time algorithm for constructing a 
(log n)-spanner with 0(n) edges for any weighted undirected graph 
on n vertices. The algorithm uses a simple algorithm for incremen- 
tally maintaining single-source shortest-paths tree up to a given 
distance. 



1 Introduction 

The objective of a dynamic shortest path algorithm is to efficiently process an 
online sequence of update and query operations. Each update operation inserts 
or deletes edges from an underlying dynamic graph. Each query operation asks 
for the distance between two specified vertices in the current graph. A dynamic 
algorithm is said to be fully dynamic if it can handle both insertions and dele- 
tions. An incremental algorithm is an algorithm that can handle insertions, but 
not deletions, and a decremental algorithm is an algorithm that can handle dele- 
tions, but not insertions. Incremental and decremental algorithms are sometimes 
referred to as being partially dynamic. An all-pairs shortest paths (APSP) algo- 
rithm is an algorithm that can report distances between any two vertices of the 
graph. A single-source shortest paths (SSSP) algorithm can only report distances 
from a given source vertex. 

We present three results related to dynamic shortest paths problems. We 
begin with simple reductions that show that the innocent looking incremental 
and decremental SSSP problems are, in a strong sense, at least as hard as the 
static APSP problem. This may explain the lack of progress on these problems, 
and indicates that it will be difficult to improve classical algorithms for these 
problems, such as the decremental algorithm of Even and Shiloach [9]. 

We then present a new fully dynamic APSP algorithm for unweighted di- 
rected graphs. The amortized update time of the algorithm is Oim^Jn) and 
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the worst-case query time is The algorithm is randomized. The results 

returned by the algorithm are correct with very high probability. The new algo- 
rithm should be compared with a recent algorithm of Demetrescu and Italiano 
[8] and its slight improvement by Thorup [26]. Their algorithm, that works for 
weighted directed graphs, has an amortized update time of 0{n?) and a query 
time of 0(1). For sparse enough graphs our new algorithm has a faster update 
time. The query cost, alas, is much larger. 

The new algorithm can also be compared to fully dynamic reachability algo- 
rithms for directed graphs obtained by the authors in [20] and [21]. A reachabil- 
ity algorithm is only required to determine, given two vertices u and v, whether 
there is a directed path from u to v in the graph. The reachability problem, also 
referred to as the transitive closure problem, is, of course, easier than the APSP 
problem. A fully dynamic reachability algorithm with an amortized update time 
of 0{my/n) and a worst-case query time of 0{^Jn) is presented in [20]. A fully 
dynamic reachability algorithm with an amortized update time of 0(m + n\ogn) 
and a worst-case query time of 0{n) is presented in [21]. 

Finally, we present a simple application of incremental SSSP algorithms, 
showing that they can be used to speed up the operation of the greedy al- 
gorithm for constructing spanners. In particular, we obtain an 0{n^ log n) time 
algorithm for constructing an 0(log n)-spanner with 0{n) edges for any weighted 
undirected graph on n vertices. The previously fastest algorithm for constructing 
such spanners runs in 0{mn) time. 

The rest of this paper is organized as follows. In the next section we describe 
the hardness results for incremental and decremental SSSP. We also discuss the 
implications of these results. In Section 3 we then present our new fully dynamic 
APSP algorithm. In Section 4 we present our improved spanner construction al- 
gorithm. We end in Section 5 with some concluding remarks and open problems. 



2 Hardness of Partially Dynamic SSSP Problems 

We start with two simple reductions that show that the incremental and decre- 
mental weighted SSSP problems are at least as hard as the static weighted APSP 
problem. We then present two similar reductions that show that the incremen- 
tal and decremental unweighted SSSP problems are at least as hard as several 
natural static graph problems such as Boolean matrix multiplication and the 
problem of finding all edges of a graph that are contained in triangles. 

Let A be an incremental (decremental) algorithm for the weighted (un- 
weighted) directed (undirected) SSSP problem. We let initA{m,n) be the initial- 
ization time of A on a graph with m edges and n vertices. We let update^{m,n) 
be the amortized edge insertion (deletion) time of A, and query ^(m,n) be the 
amortized query time of A, where m and n are the number of edges and ver- 
tices in the graph at the time of the operation. We assume that the functions 
jmt_ 4 (m,n), updatej^{m,n) and queryj^^{m,n) are monotone in m and n. 

Theorem 1. Let A he an incremental (decremental) algorithm for the weighted 
directed (undirected) SSSP problem. Then, there is an algorithm for the static 
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APSP problem for weighted graphs that runs in 0(init^(m + n,n + 1) + n- 
update^(m + n, n + 1) + n^-query^(m + n, n + 1)) time. 

Proof. Let G = {V, E) be a graph, with \V\ = n and \E\ = m, and let w : E ^ 
M'*" be an assignment of non-negative weights to its edges. The proof works for 
both directed and undirected graphs. We assume, without loss of generality, that 
V = {1,2, ... , n}. Let W = maxeg e w(e) be the maximum edge weight. 

Assume, at first, that A is a decremental algorithm. We construct a new 
graph Go = (F U {0},if U ({0} x F)), where 0 is a new source vertex. A new 
edge (0,j), where 1 < j < n is assigned the weight j-nW. (See Figure 1(a).) 
The graph Go, composed of n -|- 1 vertices and to -|- n edges, is passed as the 
initial graph to the decremental algorithm A. The source is naturally set to 
be 0. After A is initialized, we perform the n queries query{j), for 1 < j < n. 
Each query query (j) returns 5 gq(0,j), the distance from 0 to j in Gq. As the 
weight of the edge (0, 1) is substantially smaller than the weight of all other edges 
emanating from the source, it is easy to see that 5 g( 1, j) = <5 go(0g) ~ nW, for 
every 1 < j < n. We now delete the edge (0, 1) from Gq and perform again the n 
queries query{j), for 1 < j < n. We now have 6c{‘2,j) = (5go(0,j) — 2nW, for 
every 1 < j < n. Repeating this process n — 2 more times we obtain all distances 
in the original graph by performing only n edge deletions and queries. 

The proof when A is an incremental algorithm is analogous. The only differ- 
ence is that we now insert the edges (0,j) one by one, in reverse order. We first 
insert the edge (0, n), with weight n“^W, then the edge (0,n — 1) with weight 
(n — l)nW, and so on. □ 

We note that the simple reduction just described works for undirected, di- 
rected, as well as acyclic directed graphs (DAGs). We next move to unweighted 
versions of the problem. 

Theorem 2. Let A he an incremental (decremental) algorithm for the un- 
weighted directed (undirected) SSSP problem. Then, there is an algorithm that 
multiplies two Boolean n x n matrices, with a total number of m 1 ’s, in 
0(init_4(TO -|- 2n,4n) -I- n-update^(TO -I- 2n,4n) -|- n^-query^(TO -I- 2n,4n)) time. 

Proof. Let A and B be two Boolean n x n matrices. Let G = AB be their 
Boolean product. Construct a graph G = (V,E) as follows: V = {si,Ui,Vi,Wi \ 
1 < i < n}, and E = {(s^, s^+i) | 1 < f < n} U {(s^, uf) \ 1 < i < n} U {(rti, Vj) \ 
Oij = 1, 1 < J < n} U {{vi,Wj) I bij = 1, 1 < i,j < n}. (See Figure 1(b).) The 
graph G is composed of 4n vertices and to -I- 2n — 1 edges. Let s = si. It is easy 
to see that 6G{s,Wj) = 3 if and only if cy = 1. We now delete the edge (si,ui). 
Now, Sg{s, Wj) = 4 if and only if C 2 j = 1. We then delete the edge (s 2 , ^ 2 ), and 
so on. Again we use only n delete operations and queries. The incremental 
case is handled in a similar manner. □ 

Discussion. All known algorithms for the static APSP problems in weighted 
directed or undirected graphs run in fiimn) time. A running time of 0(mn + 
logn) is obtained by running Dijkstra’s algorithm from each vertex (see [10]). 
Slightly faster algorithms are available, in various settings. For the best available 
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Fig. 1. Reductions of static problems to incremental or decremental SSSP problems 



results see [10], [25], [13], [18], [17]. Karger et al. [15] show that any path- 
comparison algorithm for the problem must have a running time of f2{mn). 

The reduction of Theorem 1 shows that if there is an incremental or decre- 
mental SSSP algorithm that can handle n update operations and n? query oper- 
ations in o{mn) time, then there is also an o{mn) time algorithm for the static 
APSP problem. We note that the trivial ‘dynamic’ SSSP algorithm that simply 
constructs a shortest paths tree from scratch after each update operation han- 
dles n update operations in 0{mn) time. Almost any improvement of this trivial 
algorithm, even with much increased query times, will yield improved results for 
the static APSP problem. 

An interesting open problem is whether there are incremental or decremental 
SSSP algorithms for weighted graphs that can handle m updates and queries 
in 0{mn) time. (Note that the number of updates here is m and not n.) 

We next consider unweighted versions of the SSSP problem. A classical result 
in this area is the following: 

Theorem 3 (Even and Shiloach [9]). There is a decremental algorithm for 
maintaining the first k levels of a single-source shortest-paths tree, in a directed 
or undirected unweighted graph, whose total running time, over all deletions, is 
0{km), where m is the initial number of edges in the graph. Each query can he 
answered in 0(1) time. 

It is easy to obtain an incremental variant of this algorithm. Such a variant 
is described, for completeness, in Section 4, where it is also used. 

How efficient is the algorithm of [9] , and what are the prospects of improving 
it? If k, the number of levels required is small, then the running time of the 
algorithm is close to be optimal, as I7(m) is an obvious lower bound. But, if a 
complete shortest paths tree is to be maintained, i.e., k = n—l, the running time 
of the algorithm becomes 0{mn). How hard will it be to improve this result? 

Our reductions for the unweighted problems are slightly weaker than the ones 
we have for the weighted problems. We cannot reduce the static APSP problems 
to the partially dynamic SSSP problems, but we can still reduce the Boolean 
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matrix multiplication problem to them. The APSP problem for undirected un- 
weighted graphs can be reduced to the Boolean matrix multiplication problem 
(see [11], [23], [24]), but these reductions does not preserve sparsity. 

The fastest known combinatorial algorithm for computing the Boolean prod- 
uct of two n X n matrices that contain a total of m I’s runs in 0{mn) time. 
By a combinatorial algorithm here we refer to an algorithm that does not rely 
on fast algebraic matrix multiplication techniques. Using such algebraic tech- 
niques it is possible to multiply the matrices in time (see [7]), and also 

in n}-^ + n^) time (see [29]). Obtaining a combinatorial Boolean matrix 

multiplication algorithm whose running time is + n^), or 

for some e > 0, is a major open problem. 

The reduction of Theorem 2 shows that reducing the total running time 
of the algorithm of [9] to o(mn), using only combinatorial means, is at least 
as hard as obtaining an improved combinatorial Boolean matrix multiplication 
algorithm. Also, via the reduction of the static APSP problem to Boolean matrix 
multiplication, we get that an incremental or decremental SSSP algorithm with 
a total running time of and a query time of 0(n^“'^), for some e > 0, 

will yield a combinatorial algorithm for the static APSP problem with a running 
time of 0(n^“*^). We believe that this provides strong evidence that improving 
the algorithm of [9] will be very hard. 

It is also not difficult to see that if the first k levels of a single-source shortest- 
paths tree can be incrementally or decrementally maintained in o{km) time, then 
there is an o{mn) time Boolean matrix multiplication algorithm. The details will 
appear in the full version of the paper. 

Chan [5] describes a simple reduction from the rectangular Boolean matrix 
multiplication problem to the fully dynamic subgraph connectivity problem. It 
is similar in spirit to our reduction. The details, and the problems involved, are 
different, however. 

As a final remark we note that we have reduced the APSP problem and 
the Boolean matrix multiplication problem to offline versions of incremental or 
decremental SSSP problem. It will thus be difficult to obtain improved algo- 
rithms for partially dynamic SSSP problems even if all the update and query 
operations are given in advance. 



3 Fully Dynamic All-Pairs Shortest Paths 

In this section we obtain a new fully dynamic algorithm for the all-pairs shortest 
paths problem. The algorithm relies on ideas of [14] and [20]. We rely on following 
result of [14] and a simple observation of [28]: 

Theorem 4 (Henzinger and King [14]). There is a randomized decremen- 
tal all-pairs shortest-paths algorithm for directed unweighted graphs whose total 
running time, over all deletions, is _|_ uin log^ n) and whose query 

time is 0{t), where m and n are the number of edges and vertices in the initial 
graph, and t > 1 is a parameter. (In particular, for t < n/logn, the total run- 
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ning time is Every result returned by the algorithm is correct with 

a probability of at least 1 — n~‘^, where c is a parameter set in advance. 

Lemma 1 (Ullman and Yannakakis [28]). Let G = {V, E) be a directed 
graph on n vertices. Let 1 < k < n, and let S be a random subset ofV obtained 
by selecting each vertex, independently, with probability p = (clnn)/fc. (Ifp > 1, 
we let S be V.) Lfp is a path in G of length at least k, then with a probability of 
at least 1 — at least one of the vertices on p belongs to S . 

The new algorithm works in phases as follows. In the beginning of each 
phase, the current graph G = (V, E) is passed to the decremental algorithm 
of [14] (Theorem 4). A random subset S of the vertices, of size {cnhin)/k, is 
chosen, where fc is a parameter to be chosen later. The standard BFS algorithm 
is then used to build shortest paths trees to and from all the vertices of S. If 
w € V, we let Tin{w) be a tree of shortest paths to w, and Tout{w) be a tree of 
shortest paths from w. The set C is initially empty. 

An insertion of a set E' of edges, all touching a vertex v & V , said to be the 
center of the insertion, is handled as follows. First if \G\ > t, where t is a second 
parameter to be chosen later, then the current phase is declared over, and all the 
data structures are reinitialized. Next, the center v is added to the set G, and the 
first k levels of shortest paths trees Tin{v) and Tout{v), containing shortest paths 
to and from v, are constructed. The trees Tin{v) and Tout{v) are constructed and 
maintained using the algorithm of [9] (Theorem 3). Finally, shortest paths trees 
Tin{w) and Tout{w), for every w € S, are constructed from scratch. (Note that 
we use Tin{v) and Tout{v) to denote the trees associated with a vertex v G G, 
and Tin{w) and Tout{w), without the hats, to denote the trees of a vertex w € S. 
The former are decrementally maintained, up to depth k, while the later are 
rebuilt from scratch following each update operation.) 

A deletion of an arbitrary set E' of edges is handled as follows. First, the 
edges of E' are removed from the decremental data structure initialized at the 
beginning of the current phase, using the algorithm of [14] (Theorem 4). Next, 
the algorithm of [9] (Theorem 3) is used to update the shortest paths trees Tin{v) 
and Tout{v), for every v G G. Finally, the trees Tin{w) and Tout{w), for every 
w G S, are again rebuilt from scratch. 

A distance query Query{u, v), asking for the distance d{u, v) from u to in the 
current version of the graph, is handled using the following three stage process. 
First, we query the decremental data structure, that keeps track of all delete 
operations performed in the current phase, but ignores all insert operations, and 
get an answer ii. We clearly have d{u,v) < ii, as all edges in the decrementally 
maintained graph are also edges of the current graph. Furthermore, if there is a 
shortest path from m to in the current graph that does not use any edge that 
was inserted during the current phase, then d{u,v) = i\. 

Next, we try to find a shortest path from u to v that passes through one of 
the insertion centers contained in G. For every w G G, we query Tin{w) for the 
distance from u to w and Tout(w) for the distance from w to v, and add these 
two numbers. (If d{u, w) > k, then u is not contained in Tin(w) and the distance 
from w to u, in the present context, is taken to be oo. The case d{w,v) > k 
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Init{G, k, t): 




Insert{E' , v)-. 


1. Init-Dec{G,t) 




1. E ^ EUE' 


2. C^cj) 




2. if |C| > t then Init{G, k, t) 


3. 5 z— Randomiy, {cn\nn)/k) 




3. G-^Gu{v} 


4. Build- Trees(S) 
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5. lnit-Tree{fout{v), E,k) 






6. Build- Trees(S) 


Delete{E'): 






1. E ^ E-E' 




2. Delete- Dec(E') 






3. for every v G C 




Build-Trees(S): 


4. Delete-Tree{Tin(v), E' ,k) 




1. for every w G S 


5. Delete-Tree{Tout{v), E' ,k) 




2. BFS{Tin{w),E) 


6. Build-Trees{S) 




3. BFS{Tout{w),E) 



Query{u, v)-. 

1. li ■«— Query-Dec{u,v) 

2. £2 ■«— mm„gc Query-Tree{Tin{w),u) + Query-Tree(Tout(w),v) 

3. £3 ^ minxes Query-Tree{Ti„{w),u) + Query-Tree{Tout{w),v) 

4. return min{^i,£2,^3} 



Fig. 2. The new fully dynamic all-pairs shortest paths algorithm. 



is handled similarly.) By taking the minimum of all these numbers we get a 
second answer that we denote by £2- Again, we have d(u,v) < £2- Furthermore, 
if d(u, v) < k, and there is a shortest path from u to z; in the current graph that 
passes through a vertex that was an insertion center in the current phase of the 
algorithm, then d{u,v) = £2- 

Finally, we look for a shortest path from uto v that passes through a vertex 
of S. This is done in a similar manner by examining the trees associated with 
the vertices of S. The answer obtained using this process is denoted by £3. (If 
there is no path from u to v that passes through a vertex of S, then £3 = 00.) 
The final answer returned by the algorithm is min{£i,£2,£3}- 

A formal description of the new algorithm is given in Figure 2 . The algorithm 
is initialized by a call Init{G,k,t), where G = {V,E) is the initial graph and k 
and t are parameters to be chosen later. Such a call is also made at the beginning 
of each phase. A set E' of edges, centered at v, is added to the graph by a 
call Insert{E' , v) . A set E' of edges is deleted by a call Delete{E'). A query 
is answered by calling Query{u,v). A call Build- Trees{S) is used to (re)build 
shortest paths trees to and from the vertices of S. 

The call Init-Dec{G,t), in line 1 of Init, initializes the decremental algorithm 
of [ 14 ]. The call Random{V, (cn In n)/fc), in line 3 , chooses the random sample S. 
The call Build- Trees{S) , in line 4 , construct the shortest paths trees Tin{w) and 
Tout{w), for every w G S. A call Init-Tree{Tin{v) , E , k) (line 4 of Insert) is used to 
initialize the decremental maintenance of the first k levels of a shortest paths tree 
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Tin{v) to V. Such a tree is updated, following a deletion of a set E' of edges, using 
a call Delete-Tree{Tin{v), E') (line 4 of Delete). A query Query-Tree(Tin{w),u) 
(line 2 of Query) is used to find the distance from u to w in the tree Tin{w). If u 
is not in Tin{w), the value returned is oo. Such a tree-distance query is easily 
handled in 0(1) time. The out-trees Tout{v) are handled similarly. Finally a call 
BFS{Tin{w),E) (line 2 of Build-Trees) is used to construct a standard, static, 
shortest paths tree to w. Distances in such trees are again found by calling 
Query- Tree(Ti„ (w ) , u) (line 3 of Query) . 

Theorem 5. The fully dynamic all-pairs shortest paths algorithm of Figure 2 
handles each insert or delete operation in 0 ( ™”'^ 2 °^^ -\-km-\- " ) amortized 

time, and answers each distance query in 0(t -\- ” ) worst-case time. Each 

result returned by the algorithm is correct with a probability of at least 1 — 2n“°. 
By choosing k = (nlogn)^/^ and (nlogn)^/^ ^ t < n^/^(log we get an 
amortized update time of a worst-case query time of Oft). 

Proof. The correctness proof follows from the arguments outlined along side 
the description of the algorithm. As each estimate £ 1,^2 and £3 obtained while 
answering a distance query Query fa, v) is equal to the length of a path in the 
graph from u to v, we have dfa, v) < We show that at least one of these 

estimates is equal, with very high probability, to dfa,v). 

If there is a shortest path from u to u that does not use any edge inserted in 
the current phase, then dfa,v) = l\, assuming that the estimate £1 returned by 
the decremental data structure is correct. The error probability here is only 

Suppose therefore that there is a shortest path p from utov that uses at least 
one edge that was inserted during the current phase. Let w be the latest vertex 
on p to serve as an insertion center. If d{u, v) < k, then the correct distance 
from u to u will be found while examining the trees Tinfw) and Tout{w)- 

Finally, suppose that d{u, v) > k. Let p be a shortest path from u to v in the 
current graph. By Lemma 1, with a probability of at least 1 — the path p 
passes through a vertex of w of S, and the correct distance will be found while 
examining the trees Tinfw) and Toutfw). 

We next analyze the complexity of the algorithm. By Theorem 4, the total 
cost of maintaining the decremental data structure is ”• ). As each 

phase is composed of at least t update operations, this contributes 
to the amortized cost of each update operation. Each insert operation triggers 
the creation (or recreation) of two decremental shortest paths trees that are 
maintained only up to depth k. By Theorem 3 the total cost of maintaining 
these trees is only Ofkm). (Note that this also covers the cost of all future 
operations performed on these trees.) Finally, each insert or delete operation 
requires the rebuilding of (cn In n)/fc shortest paths trees at a total cost of 
Q^ mniogn )^ The total amortized cost of each update operation is therefore 

Q^ mn ^logn ^°^ " ) , as claimed. Each query is handled by the algorithm 

in Oft -\- The estimate £1 is obtained in Oft) time by querying the 

decremental data structure. The estimate £2 is obtained in Oft) by considering 
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Init: 

1. for every v G V, 

2. d[r;] cx3 ; p{v\ null ; A^[r;] ■(— 4> 

3. d[s] G- 0 



Insert{u, v)\ 

1. N[u] G- U {«} 

2. Scan{u, v) 



Scan(u, v): 

1. d' G- c![m] + wt{u, v) 

2. if d' < d[ii] and d' <k then 

3. d[w] ■«— d' ; p[v] G- u 

4. for every w G A'’[n], 

5. Scan{v,w) 



Fig. 3. A simple incremental SSSP algorithm. 



all the trees associated with C. Finally the estimate £3 is obtained in 
time by examining all the trees associated with S. 

By examining these bounds it is obvious that k = (nlogn)^/^ is the optimal 
choice for k. By choosing t in the range (nlogn)^/^ < t < n^/^(log we 

get a tradeoff between the update and query times. The fastest update time of 
0 (m(n log is obtained by choosing t = n^/^{logny^^ . □ 



4 An Incremental SSSP Algorithm and Greedy Spanners 

A simple algorithm for incrementally maintaining a single-source shortest-paths 
tree from a source vertex s up to distance k is given in Figure 3. The edge 
weights are assumed to be non-negative integers. The algorithm may be seen as 
an incremental variant of the algorithm of [9]. It is also similar to an algorithm 
of Ramalingam and Reps [19]. The algorithm is brought here for completeness. 

For each vertex v G V, d[v] is the current distance from s to v, p[v\ is the 
parent of v in the shortest paths tree, and N[v] are the vertices that can be 
reached from v by following an outgoing edge. The integer weight of an edge 
(u,v) is denoted by wt{u,v). As described, the algorithm works on directed 
graphs. It is easy to adapt it to work on undirected graphs. (We simply need to 
scan each edge in both directions.) 

Theorem 6. The algorithm of Figure 3 incrementally maintains a shortest- 
paths tree from a source vertex s up to distance k in a directed unweighted graph 
using a total number of 0{km) operations, where m is the number of edges in 
the final graph. Each distance query is answered in 0(1) time. 

Proof. (Sketch) It is easy to see that the algorithm correctly maintains the dis- 
tances from s. The complexity is 0{km) as each edge {u,v) is rescanned only 
when the distance from s to u decreases, and this happens at most k times. □ 

We next define the notion of spanners. 
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Greedy- Spanner(G, k): 

1 . ^ 0 

2. for each edge (u,v) G E, in non-decreasing order of weight, do 

3. if 5e' (u, v) > {2k — 1) ■ wt{u, v) then 

3’. [if dE'{u,v) > {2k — 1) then] 

4. E' ■(- E' U{{u,v)} 

5. return G' ^ {V, E') 



Fig. 4. A greedy algorithm for constructing spanners. 



Definition 1 (Spanners [16]). Let G = {V,E) be a weighted undirected graph, 
and let t>l. A subgraph G' = (V) E') is said to be a t-spanner of G if and only 
if for every u,v €V we have 5c{u,v) < t-Sc{u,v). 

The greedy algorithm of Althofer et al. [1] for constructing sparse spanners 
of weighted undirected graphs is given in Figure 4. For every integer k > 2, 
it constructs a {2k — l)-spanner with at most edges. This is an essen- 

tially optimal tradeoff between stretch and size. The algorithm is reminiscent of 
Kruskal’s algorithm for the construction of a minimum spanning tree algorithm. 
A naive implementation of this algorithm requires 0{mn^~^^^^) time. 

We consider a variant of the algorithm in which line 3 is replaced by line 3’. 
For every edge {u,v) € E, the original algorithm checks whether Se'(u,v) > 
{2k — l)wt{u, v), i.e., whether the weighted distance from u to f in the subgraph 
composed of the edges already selected to the spanner is at most 2A: — 1 times the 
weight wt{u, v) of the edge. The modified version of the algorithm asks, instead, 
whether dE'{u,v) > 2k — 1, i.e., whether the unweighted distance between u 
and V in the subgraph (V, E') is greater than 2fc — 1. We now claim: 

Theorem 7. The modified version of the greedy spanner algorithm still produces 
a {2k— 1)- spanner with at most edges for any weighted graph on n vertices. 

Proof. The claim follows from a simple modification of the correctness proof of 
the greedy algorithm. If an edge {u, v) is not selected by the modified algorithm, 
then dE'{u,v) < 2k — 1. As the edges are scanned in an increasing order of 
weight, all the edges on the shortest path connecting u and v in {V,E') are of 
weight at most wt{u,v), and therefore Se'{u,v) < {2k — 1) -wt{u,v). Thus, the 
edge {u, v) is also not selected by the original algorithm. The edge set returned 
by the modified algorithm is therefore a superset of the edge set returned by the 
original algorithm, and is therefore a {2k — l)-spanner of G. 

The proof that the set of edges E' returned by the original algorithm is of size 
at most relies only on the fact that the girth of G' = (V, E') is at least 

2k-\-l. This also holds for the set E' constructed by the modified algorithm, as 
we never add to E' an edge that would form a cycle of size at most 2k. Hence, the 
size of the set E' returned by the modified algorithm is also at most □ 

Theorem 8. The modified greedy algorithm of Figure 4 can be implemented to 
run in 0{kn^^^/^)time. 
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Proof. We use the algorithm of Figure 3 to maintain a tree of shortest-paths, up 
to distance 2k — 1, from each vertex of the graph. As the spanner contains at 
most edges, the total cost of maintaining each one of these trees is only 

and the total cost of the algorithm is as claimed. □ 

Discussion. There are several other algorithms for constructing sparse span- 
ners of weighted graphs. In particular, a randomized algorithm of Baswana and 
Sen [4] constructs a {2k — l)-spanner with edges in 0{m) expected 

time. A randomized 0{mn}l^') algorithm for constructing such spanners is de- 
scribed in [27]. Why then insist on a faster implementation of the greedy al- 
gorithm? The answer is that the greedy algorithm constructs slightly sparser 
spanners. It produces {2k — I)-spanners with at most edges (no big-O is 

needed here). When k is non-constant, this is significant. When we let k = logn, 
the greedy algorithm produces an 0(log n)-spanner containing only 0{n) edges. 
All other algorithms produce spanners with f2{nlogn) edges. It is, of course, an 
interesting open problem whether such spanners can be constructed even faster. 

Another interesting property of the (original) greedy algorithm, shown by [6], 
is that the total weight of the edges in the (2fc — l)-spanner that it constructs is at 
most 0{n^^'^^'>/^-wt{MST{G))), for any e > 0, where wt{MST{G)) is the weight 
of the minimum spanning tree of G. Unfortunately, this property no longer holds 
for the modified greedy algorithm. Again, it is an interesting open problem to 
obtain an efficient spanner construction algorithm that does have this property. 

An efficient implementation of a different variant of the greedy algorithm, in 
the setting of geometric graphs, is described in [12]. 

5 Concluding Remarks and Open Problems 

We presented a simple reduction from the static APSP problem for weighted 
graphs to offline partially dynamic SSSP problem for weighted graphs, and a 
simple reduction from the Boolean matrix multiplication problem to the offline 
partially dynamic SSSP problem for unweighted graphs. 

An interesting issue to explore is whether faster partially dynamic SSSP 
algorithms may be obtained if approximate answers are allowed. (For steps in 
this direction, but for the approximate dynamic APSP problem, see [2,3,22].) 

References 

1. I. Althofer, G. Das, D. Dobkin, D. Joseph, and J. Soares. On sparse spanners of 
weighted graphs. Discrete & Computational Geometry, 9:81-100, 1993. 

2. S. Baswana, R. Hariharan, and S. Sen. Improved decremental algorithms for tran- 
sitive closure and all-pairs shortest paths. In Proc. of 34th STOC, pages 117-123, 
2002 . 

3. S. Baswana, R. Hariharan, and S. Sen. Maintaining all-pairs approximate shortest 
paths under deletion of edges. In Proc. of 14th SODA, pages 394-403, 2003. 

4. S. Baswana and S. Sen. A simple linear time algorithm for computing {2k — 1)- 

spanner of size for weighted graphs. In Proc. of 30th ICALP, pages 

384-296, 2003. 



On Dynamic Shortest Paths Problems 591 



5. T. Chan. Dynamic subgraph connectivity with geometric applications. In Proc. of 
34th STOC, pages 7-13, 2002. 

6. B. Chandra, G. Das, G. Narasimhan, and J. Soares. New sparseness results on 
graph spanners. Intemat. J. Comput. Geom. AppL, 5:125-144, 1995. 

7. D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progres- 
sions. Journal of Symbolic Computation, 9:251-280, 1990. 

8. C. Demetrescu and G. Italiano. A new approach to dynamic all pairs shortest 
paths. In Proc. of 35th STOC, pages 159-166, 2003. 

9. S. Even and Y. Shiloach. An on-line edge-deletion problem. Journal of the ACM, 
28(l):l-4, 1981. 

10. M. Fredman and R. Tarjan. Fibonacci heaps and their uses in improved network 
optimization algorithms. Journal of the ACM, 34:596-615, 1987. 

11. Z. Galil and O. Margalit. All pairs shortest distances for graphs with small integer 
length edges. Information and Computation, 134:103-139, 1997. 

12. J. Gudmundsson, C. Levcopoulos, and G. Narasimhan. Fast greedy algorithm for 
constructing sparse geometric spanners. SIAM J. Comput., 31:1479-1500, 2002. 

13. T. Hagerup. Improved shortest paths on the word RAM. In Proc. of 27th ICALP, 
pages 61-72, 2000. 

14. M. Henzinger and V. King. Fully dynamic biconnectivity and transitive closure. 
In Proc. of 36th FOCS, pages 664-672, 1995. 

15. D. Karger, D. Roller, and S. Phillips. Finding the hidden path: time bounds for 
all-pairs shortest paths. SIAM Journal on Computing, 22:1199-1217, 1993. 

16. D. Peleg and A. Schaffer. Graph spanners. J. Graph Theory, 13:99-116, 1989. 

17. S. Pettie. A new approach to all-pairs shortest paths on real-weighted graphs. 
Theoretical Computer Science, 312(l):47-74, 2004. 

18. S. Pettie and V. Ramachandran. Computing shortest paths with comparisons and 
additions. In Proc. of 13th SODA, pages 267-276, 2002. 

19. G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the 
shortest-path problem. Journal of Algorithms, 21(2):267-305, 1996. 

20. L. Roditty and U. Zwick. Improved dynamic reachability algorithms for directed 
graphs. In Proc. of 43rd FOCS, pages 679-688, 2002. 

21. L. Roditty and U. Zwick. A fully dynamic reachability algorithm for directed 
graphs with an almost linear update time. In Proc. of 36th STOC, pages 184-191, 
2004. 

22. L. Roditty and U. Zwick. Dynamic approximate all-pairs shortest paths in undi- 
rected graphs. In Proc. of 45th FOCS, 2004. To appear. 

23. R. Seidel. On the all-pairs-shortest-path problem in unweighted undirected graphs. 
J. Comput. Syst. Sci., 51:400-403, 1995. 

24. A. Shoshan and U. Zwick. All pairs shortest paths in undirected graphs with 
integer weights. In Proc. of 40th FOCS, pages 605-614, 1999. 

25. M. Thorup. Undirected single-source shortest paths with positive integer weights 
in linear time. Journal of the ACM, 46:362-394, 1999. 

26. M. Thorup. Fully-dynamic all-pairs shortest paths: Faster and allowing negative 
cycles. In Proc. of 9th SWAT, 2004. To appear. 

27. M. Thorup and U. Zwick. Approximate distance oracles. In Proc. of 33rd STOC, 
pages 183-192, 2001. Full version to appear in the Journal of the ACM. 

28. J. Ullman and M. Yannakakis. High-probability parallel transitive-closure algo- 
rithms. SIAM Journal on Computing, 20:100-125, 1991. 

29. R. Yuster and U. Zwick. Fast sparse matrix multiplication. In Proc. of 12th ESA, 
2004. 



Uniform Algorithms for Deterministic 
Construction of Efficient Dictionaries 



Milan Ruzic 

The Faculty of Mathematics, Belgrade, Serbia and Montenegro 
mruzicSeunet . yu 



Abstract. We consider the problem of linear space deterministic dictio- 
naries over the universe {0, 1}’". The model of computation is the unit- 
cost RAM with word length w. Many modern solutions to the dictionary 
problem are weakly non-uniform, i.e. they require a number of constants 
to be computed at “compile time” for stated time bounds to hold. We 
present an improvement in the class of dictionaries free from weak non- 
uniformity and with stress on faster searches. Various search-update time 
trade-offs are obtained. The time bounds for the construction of a static 
dictionary and update bounds for dynamic case contain logw term; the 
query times are independent of w. In the case of optimal query time, we 
achieve construction time log w), for any e > 0. The construction 

requires division, whereas searching uses multiplication only. 



1 Introduction 

A dictionary is a data structure that stores a set of keys S and supports answering 
of membership queries “Is x in S'?”. Keys come from the universe U = {0, 1}™ 
and may be accompanied by satellite data which can be retrieved in case x G S. 
A dictionary is called dynamic if it also supports updates of S through insertions 
and deletions of elements. Otherwise, the dictionary is called static. 

There are several characteristics that make distinction between various types 
of dictionaries, most important being: the amount of space occupied by a dictio- 
nary, the time needed for performing a query and the time taken by an update 
or a construction of the whole dictionary. Here, we consider only dictionaries 
with realistic space usage of 0(n) registers, with n = 151. Search time is given 
the priority over update times. 

Our model of computation is the word RAM whose instruction set includes 
multiplication and division; all instructions can be completed in one unit of time. 
Elements of U are supposed to fit in one machine word and may be interpreted 
either as integers from {0, . . . , 2™ — 1} or as bit strings from {0, 1}“. 

Algorithms involved in construction of a dictionary may be randomized - 
they require a source of random bits and their time bounds are usually expected. 
Randomized dictionaries reached the stage of high development and there is 
very little left to be improved. On the other hand, deterministic dictionaries 
with guaranteed time bounds are still evolving. In recent years, a few results in 
this area have made deterministic dictionaries more competitive. However, many 
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of these results suffer from weak non-uniformity of the algorithms. Namely, the 
code would require a number of constants to be computed at “compile time” for 
stated bounds to hold at “run time”. This is a minor flaw if all constants can be 
computed in time polynomial in w; this is the case with fusion trees [6]. But, de- 
terministic dictionaries with lowest known lookup times [9], [10] require constants 
not known to be computable in polynomial time. This may be a problem even 
for moderate word sizes and is unacceptable for larger ones. We give an efficient 
realization of the dictionary which does not suffer from weak non-uniformity. 

Fredman, Komlos, and Szemeredi [5] showed that it is possible to construct a 
linear space dictionary with constant lookup time for arbitrary word sizes. This 
dictionary implementation is known as the FKS scheme. Besides the randomized 
version they also gave a deterministic construction algorithm with a running 
time of 0{n^w). A bottleneck was the choice of appropriate hash functions. 
Raman [11] reduced the time for finding a good function to 0{n^w). Andersson 
[1] showed how to modify Raman’s solution to give a time bound 
however, he used fusion trees which introduced week non-uniformity into the 
solution. Later, Hagerup, Miltersen and Pagh [9] achieved very fast 0(n log n) 
construction time for the static case. They combined several techniques, one of 
which is error correcting codes. Yet, the problem of finding a suitable code in 
polynomial time (posed in [8]) is still unresolved. In the field of large degree 
trees, which offer balanced lookup and update times, a very efficient structure 
is described in [2]. It also supports more general predecessor queries. 

In this paper we present a family of static and dynamic dictionaries based on 
a new algorithm for choosing a good hash function. The “quality” of function 
can be tuned, so various trade-offs may be obtained. A couple of interesting 
combinations are: search time 0(1) and update time 0(n'^ login); search time 
0((loglogn)^) and update time o(n®0°g log " log u>) . Mentioned lookup times are 
not achieved by dictionaries with mild form of weak non-uniformity (polynomial 
time computable constants). The drawbacks are the presence of login term in 
update times, and the use of division in dictionary construction. 

By introducing (less severe) weak non-uniformity one can remove depen- 
dency on in from update times. For example, our dictionary with search time 
0((loglogn)^) can be used when in = 0(n), and exponential search trees [1] can 
be used for larger in. This is an alternative to error correcting codes approach. 
It has higher update times but greater spectrum of applicability (regarding the 
size of in). 

Up to now, Raman’s algorithm was the fastest method for finding a suitable 
hash function with unconstrained range and constant size description. ^ We 
decrease the dependency on in. Functions usable at the first level of the FKS 
scheme can be found in time 

O (inin{u^ log^ nlogin -I- inn, n^(login -I- logn)}) . 

Perfect hash functions can be found in time 0(n^ log^ n log in). 

^ Hagerup et al. [9] gave a faster weakly non-uniform algorithm for finding appropriate 
functions with range f7(n^). The method also contains Raman’s construction. 
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2 Family of Functions 



Multiplicative hashing families can generally be regarded as families of type: 

T~La = {ha{x) = [r * frac(aa;)J | a € A} (1) 

where frac(x) denotes x — [xj . Each member of Ha maps C/ to {0, . . . , r — 1}. 
Functions of this type are used in one of the oldest hashing heuristics known as 
’’the multiplication method”. Universal family (shown in [4]) 

{(ax mod 2*") div 2““"* | a€lJ,a odd} (2) 

is of type (1). Allowing a to be wider than w bits, the same holds for 

{[{kx mod p) * r/p\ \ k G U, p prime and p > 2’"} 

a variation of the well-known family [5]. 

For easier notation, by we denote {u/2” | 0 < u < 2”, u integer}. The set 
Qy can be regarded as the set of numbers from (0, 1) given in precision 2“”. Our 
dictionaries use functions from Hq^, where v = 0{w). As we will see, discrete 
families mentioned above inherit good distribution of elements from their real 
extension Hm. = {ha : IR Zj.| a G H}. The following simple result is the base 
of our construction methods. 



Lemma 1. If x ^ y and 



«eU 

k^Z 



- y\ 



(k + l/r), 



’ - y\ 



{k + l- 1/r) 



(3) 



then ha{x) yf ha{y), for ha G Hjr- 

Proof. For /iq(x) yf ha{y) to be true, it is sufficient that |frac(ax) — frac(aj/)| > 
1/r. In case x > y we have 



frac(a * |x — y|) = frac(ax — ay) 

= frac(ni -|- frac(ax) — {u 2 + frac(ay))) . 



Additionally, if frac(ax) > frac(a?/), the condition becomes frac(a* |x — ?/|) > 1/r. 
When frac(ax) < frac(ay) we need frac(a* |x — y|) < 1 — The case y < x makes 
no difference. It is easy to verify that frac(a * |x — y\) G (y, 1 — y) is equivalent 
to (3). □ 



Proposition 1. Let C he the set of all a from [0, 1] such that the sum of all 
pairs |x,y} G (f) for which ha{x) = ha{y) is less than m. Then C has 



of at least 1 — min \ ; 1 1 • 

I rm ’ I 



measure 
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Proof. We denote the intersection of [0, 1] and the set given in (3) by A^y Let 
Kxy : [0,1] — >■ {0,1} be the characteristic function of the set Afy. The sum 
of all collisions for a fixed a is at most J2x<y ^xy{a) (the condition (3) is not 
necessary for avoiding the collision between x and y). We use more standard 
Riemann integral because it is equal to Lebesgue integral in this case. 




Kxy{t)dt = 

x<y 




K^y{t)dt 




2 

r 



The measure of the set {a G [0,1] | J2x<y^xy{a) > m} can be at most 
min I 1 1 . □ 

I rm ‘ j 



From the previous result we also observe that ’’goodness” of mentioned fami- 
lies is not essentially due to some number-theoretic properties; of course, the way 
a discrete subfamily of "H]r is chosen does matter. For example, taking m = n 
yields functions usable at the first level of the FKS scheme. If the range is set to 
2n then more than ’’half” of functions from Hjr, would be usable. The following 
variation of Proposition 1 is directly used in Sect. 3. The proof is similar; we 
only adopt a designation also needed later: Ad = A^y for \x — y\ = d. 



Lemma 2. Let D = {dk)%^i, where d^ are elements of U \ {0}. Also, let m he 
a positive integer, i? = (a € [0, 1] | [{A: | a G o,nd 

C = {a e [0, 1] I \{k I (3a; G U){x + dk € U A ha{x) = ha{x + dfc))}[ < m} 

Then, B <Z C and the measure of B is at least 1 — min | , 1 } . 

We will make an important convention that greatly simplifies notation. 
consists of disjoint and evenly spaced intervals. We call them support intervals 
because they form the support of the set’s characteristic function. Ends of sup- 
port intervals are generally not in Qaum for an integer constant A which will be 
determined by Theorem 1. Q\w is the set in which we find an appropriate a. 
From now on, when we specify some computation involving a support interval, it 
means that it is performed on the interval obtained by rounding up the support 
interval’s left end, and rounding down the support interval’s right end to a num- 
ber in Q\xu ■ Some care must be taken - for example, bounds of a support interval 
cannot be determined by adding 1/d (which is also rounded) to the bounds of 
the previous interval. The set derived from A'/ by rounding its support intervals 
still contains all unsuitable a’s from our discrete set. Therefore, all calculated 
measures will be valid for our goal. 



Lemma 3. Given d G U \ {0} and b,c G Qxw, the measure of [6, c] fl A'/ can be 
computed in 0(1) time with use of division. 

Clearly, if ha{x) — ha(x + dk) for some x then it holds for all allowed x. 



2 
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Proof. Let support intervals of A‘^ be [bk,Ck], where 0 < k < d. Let i = 
maxjfc \ bk < b} and s = min{A: | Cfc > c}; they can be determined in constant 
time - we use division by 1/d, which amounts to multiplication. If i = s the re- 
sult is c— 6. Otherwise, we compute the lengths of [6, c] 0 [6^, c^] and [6, c] 0 Cg]. 
To their sum we then add and get the measure. The division can be 

accomplished in constant time since both operands are 0{w) bit integers. □ 



Lemma 4. Let B be the set defined in Lemma 2. Lf c — b < 2 then 

B n [b, c] eonsists of at most m different intervals. 

Proof. Only one support interval from each set can have nonempty in- 
tersection with [6, c] and it cannot be completely inside [6, c] because it has 
greater length. Let p < (2), be the increasing sequence of right 

ends of specified intervals that have nonempty intersection with [b, c] . Also let 
q = max{0,p — TO -I- 1} and Cq = b. Then [b,Cg] is not in B. Suppose that [01,02] 
is not in B and that (02,03) is. The number of collisions decreases at 02 and 
this can happen only at one of the points Cg+i, 0 < i < m. Thus, there are at 
most TO left boundary points for B fl [b, c] , so it consists of at most to different 
intervals. □ 



3 Finding a Good Function 

The main algorithm, described in Theorem 1, accepts as input a finite sequence 
of values from {\x — y\ \ x,y € S}. For simplicity we assume that the elements of 
the sequence are distinct; therefore, we may use set notation. If a value occurs 
several times within the sequence, it is processed every time; no part of further 
discussion is affected by this. 

We denote input set (sequence) of differences by D. The algorithm needs to 
process elements of D in increasing order several times. Storing D in memory 
would generally be unacceptable. However, the types of subsets of {\x—y\ \x,y € 
S} used by our dictionaries can be enumerated in increasing order using only 
0{n) space. In other words, after calling a function that initializes some common 
variables, another function could serve elements of D one by one, as claimed by 
the next lemma. 

Lemma 5. Let S = {xi,X2, ■ . ■ ,Xn}, with Xi < Xi+i. Further, let {nk)k=i be a 
given sequence of positive integers, Uk < n/2, and 

n ni, 

D=\J \J{\xk+„,-Xk\} (4) 

fe=ij=i 

Using linear space we can go through the elements of D in increasing order, 
with each retrieval taking time O(logn). 

® By X +n y we denote [(x + y — 1) mod n\ + 1. For convenience we set the range of 
the function to be {1 , . . . , n}. 
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Proof. At the beginning, the elements of S are sorted. Candidates for the min- 
imum of D are differences of type \xk+ni ~ ^k\ and \xk — Initially, for 

each Xk the smaller of the two is inserted into a priority queue. Each difference 
is accompanied by the index of the element which the difference is attributed 
to. Further, for each Xk two indices are maintained, call them Ik and rfc. They 
mean that \xk — xij and \xr^ — Xk\ are the last “left” and “right” difference 
attributed to Xk that were inserted into the queue. After we fetch currently the 
smallest difference, suppose that is \xrj — Xj\, we take smaller of the next “left” 
and “right” differences related to Xj and insert it into the queue. At all times 
there are 0 {n) elements in the heap. Indices Ij and rj are not wrapped when 
they hit their bounds, I and n. 

Simple binary heap can be used for the priority queue because there would 
be no considerable gain from using something more sophisticated. Similarly, a 
simple 0(n log n) sorting algorithm is adequate. □ 

Note that the state of the process can be saved at some desired time during 
iteration so we don’t have to start from the beginning every time. In the following 
theorem we implicitly assume that input set D is enumerable as we described. 

Theorem 1. Let D = {di, ^2, . . . , ds}, where dk are elements of U \ {0}, and 
let V = rc-|- [Igm] -I- [Igr] -|-3. Ifrm > 4s, we ean find a G QyDB (B is defined 
in Lemma 2 ) with a deterministic algorithm that uses space 0 {n + m) and runs 
in time 



. ( 0 {s log n(log w + log r) -I- s log m + wm) 

0(s(logrc -I- log r)(m -|- log n)) 

Proof. We first use bit by bit construction to sufficiently narrow the interval in 
which we seek for a, and then we explicitely determine the intersection of B and 
that small interval. On some events several bits may be set at once. 

On start no bits are selected and the active interval is (0,1). Suppose we 
came down to the interval (01,02) and that we have selected p most significant 
bits. We partition the set of differences into three classes: D = UbigUDmidUOsmi 
where Dbig = {d G D \ ^ > 02 - oi}, Dsmi = {d G D \ ^ < and 

T^mid = D \ (d?big 0 Dsmi)- Smaller d corresponds to bigger support interval. 

We maintain information about (01,02) O for all d G d^big- If the inter- 
section is nonempty, it is completely determined either by the right end of a 
support interval, or the left end, or (01,02) G A^. Support intervals that cover 
(oi, 02) are irrelevant to the choices made by the algorithm. For accounting par- 
tial intersections, two ordered sequences are stored - one for right ends and one 
for left ends. Observe that we need information only about the last m right ends 
and the first m left ends. Let bi be the m-th largest right end and 62 be the m-th 
smallest left end (if they don’t exist take bi = oi and/or 62 = 02). The elements 
of (oi,6i] and [62,02) are certainly not in B. We can employ, for example, a 

^ A theorem with requirement ^ < !^ < 1, for constant v, could similarly be proved. 

Only constants hidden in the time bound and the word length of a would be affected. 
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B-tree structure to maintain this information. If the number of elements in the 
tree that stores right ends grows to m + 1, the smallest element is deleted as it 
becomes irrelevant. The same goes for the tree that holds left ends, except that 
the largest element gets deleted. The structure is updated as p increases. 

Let 6 = (oi + 02)72. We estimate n (oi,6)) and m{B^ fl (6,02)),^ and 

the interval with lower value (greater estimated intersection with B) wins; p-th 
bit is set accordingly (bit positions are zero based). From the requirements of 
the theorem and our construction, it will follow that B fl (01,02) yf 0; hence, it 
must be bi < 62. If 62 < 6 then (oi, 6) is chosen, if 61 > 6 then (6, 02) is chosen. 
Now, suppose that bi < 6 < 62. Like in the proof of Lemma 2 (Proposition 1): 



m{B^ n (oi, &)) = m((oi, 61]) + m{B^ fl {bi,b)) 

I ® 

< 61 - oi + — ^ n (bi,b)) 

k^l 

= bi — ai -\ f m{A^r\ (bi^b)) -\- 

m V 

Z?big 

m(A()n (61,6))) . 



The sum over the elements of Dbig can be computed in time 0 (m). The terms in 
the middle sum can be computed in constant time per set (Lemma 3). The last 
sum can be estimated by ^\Dsmi\{b — bi)^. The error produced by this estimate 
can be bounded by using the upper bound on the length of one support interval 
per each member of 



02 — Oi 



s(o2 - Oi) 



ml ™^l 2(?o + log r)r 2(rc + logr)rm 



Summing all obtained values, we get an estimate for m{B'^ fl (oi,6)) - denote 
it by ■ The other interval (6,02) is processed analogously and resulting 

estimate is denoted by ■ Note that the error bound E is the sum of errors 

made on both intervals because the intervals are consecutive. 

The same kind of measure estimate performed for (01,02) (in the previous 
step) is denoted by - the value corresponding to the winning interval in 
p-th step. Similarly for E^p\ Since we half the active interval in each step, it is 
02 — oi = 2 ~P. Observe that the difference + E^p'>) — m{B^ 0 (a[^\a2^^)) 
(relative to 2 ~p) decreases as p increases, because elements fall out of I^smi and 
fall in T>big (not directly, though). Namely: 

pY^'> + < p^P^ + E^p"! . 

Setting = min{pj^~'’^\ ^2^''’^^} gives the recurrence 

7 ( 7 ’ - 5 (^“ + ^‘■■’) ^ 5 (f-"’ + 2>«(7+iog.) ■ 

® Do not confuse m(X) as the measure set function and m as the number of allowed 
collisions. 
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An upper bound on is then 



„(p+i) < ^ nio) _i y 

^ - 2P+1^ ^ 2P+1-* 2*+i (it; + log r) rm 

= — n(0) + — . (6) 

2 P +1 2P+^{w + log r) rm 

Initial bound for m(i?“n(0, 1)) is given by Lemma 2: = ^ < 1/2. Bit by bit 

construction i.e. narrowing of the active interval is done until p = tc + |"lg r] = q. 
Substituting in (6) gives 

m(i?“ n (a^i\a^ 2 ^)) < 

< 2-9-1 ^ 2-9-3 ^ 2-9-3 ^ 5^((a(9)^ q(9))) ^ ( 7 ) 

When p = q, then D = From Lemma 4 and (7) it follows that within 

B n 0 ^ 2 '^) there must be an interval of length at least ^2-9. Hence, using 
the B-trees, in time 0(m) we can find such interval and within it an element 
from fl/uJ+Clg r] + [Ig m] +3- 

Non-constant space requirements come from the B-trees and the heap used 
for retrieving members of D. They need space 0(m) and 0{n), respectively. It 
remains to show the total time bound. There are 0{w) bit choosing steps and 

(p) 

elements from Dgmi are never considered. Elements from are processed in 

time logn), where logn factor comes from retrieving a difference from 

the heap. We show that an element of D can stay in T>mid at most 0(log rc-l-log r) 
steps. Element d is in Z?„iid iff 



2P+i(r 



2 1 
logr)r rd 2 p 



0{w) * 2^+2 >d> 



2P+1 



Igd— 0(lg w) < p < Ig d -I- 0(lg r) . 



Each element eventually enters T)big and update time per element is O(logm). 
Thus, the total load coming from Dmid is 0(s log n(log w -I- logr) -|- slogm). 

In any step for which ^ 0, we use the tree structures to acquire the 

contribution of elements from to This takes 0(m) time since we 

process elements sequentially; the procedure occurs at most 



min{ 0(w), 0(s(log u> -|- log r)) } 

times. Finally, consider the case = 0. We prefetch next dk and see in which 
step will it be processed first. That is, we find the smallest pi > p such that 
^mid 7^ 7*1 ‘"^’3 be computed in time O(logrc) and this is done at most once 

per element of D. Next, find an appropriate interval of length 2~p^^ with ends 
in Qpi, which has minimal fj,^P^\ This takes 0{m) time - an observation similar 
to the one in the proof of Lemma 4 is needed. Multiple bits are set at once if 
Pi > p -I- 1. can be empty at most min{0(u;),s} times. Adding all the 

bounds proves the theorem. □ 
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Remark that the algorithm itself is not too complicated, in spite of lengthy 
justification of its actions. Auxiliary structures used are the classical priority 
queue and the slightly modified B-tree. As a result, aggregate implementation 
complexity is not high. Use of some more efficient structures would make no 
significant gain. The disadvantages of the algorithm are the use of division and 
the bit length of result value a. For typical relations of parameters w,m,r the 
evaluation of the function ha would require 2-3 multiplications. 



4 Uniform Dictionaries 



Theorem 2. There is a linear space static dictionary with query time 0(1) and 
with deterministic construction time log w), for any fixed e > 0. 



Proof. We employ multi-level hashing scheme. Choose some Ci such that 0 < 
Cl < e/2. Initially, set 1 < fc < n (recall Lemma 5) and set the 

range to r = n. Also, we allow m = collisions, thereby satisfying the 

condition of Theorem 1 that ^ < 1/2. The algorithm described in Theorem 1 
returns value a. We assume the time bound that contains only log w factor (does 

not have factor w). Since logn = o{n^) a time bound for the first level is 

0(71^+*^ logic). 

Define = {y g S' | \x-y\ € DAha(x) = ha{y)}; \ {Jl=iPxk\ < m. Suppose 
that ha{x) = z, for some x G S. At least — \Px\ keys cannot hash to slot z. 
Take y G S, such that y ^ P^, x ^ Py and ha{y) = z. Then, another^ n^^-\Py\ 
keys cannot hash to slot z. Utilizing an induction argument we get an upper 
bound on the longest chain: +m. Without l.o.g. we may assume e\ < 1/2. 

The length of the longest chain becomes If the size of a bucket is less 

than , the parameter settings (r, m) of the FKS scheme are chosen and the 
bucket is resolved in two steps. Otherwise, the above procedure is recursively 



applied to the bucket. It takes at most 



logei 

log(l-£i) 



levels till the employment of 



the FKS scheme. 

If C{N) is the total construction time for a bucket of size N excluding the 
times for FKS constructions, it is bounded by recurrence inequality 



C{N) < 0{N^+^ log w) + ) 

^ C{N) = 0{N^+^ log w) . 



The total time for the FKS constructions is at most 



(logic -I- logn)) = 0(n^^'^ logic) . 



□ 

The static dictionary described in Theorem 2 can by dynamized the same 
way as described in [8]. 



This follows from the structure of D, given by (4). 
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Theorem 3. Let f : INf i— ^ INf &e a nondecreasing function computable in time 
and space 0(n), and 4 < t{n) < O ( \/log n / log log n) . There exists a linear space 
static deterministic dictionary with query time 0(t(n) log log n) and construction 
time 



log n log log n (log w + logn)) . 

Proof. Construction proceeds in the manner of Theorem 2, except that we don’t 
utilize the FKS scheme - the construction is continued until bucket size drops 
below logn. Comparison based methods finish the search. The setting of the 
parameters for a bucket of size N is: 

s = iVi+Vd")^ r= , m = 4iVi/‘(”)t(n)loglogn .7 

t{n) log logn 

For the longest chain to be it must hold m = which 

is satisfied due to > logn and the condition 4 < t{n) < 0{y/log nj log log n). 

The number of levels is determined by F(1V) = 1 + The condi- 

tion t{n) > 4 ensures that | log(l — l/t(n))| = 0(l/t(n)). Hence 

L{N) = 0{t{n)loglog N) , 

which also bounds the search time. 

The full construction time is given by 

C{N) < (log w-k log n)(m-k log 

^ C{N) < 0(iVi+i/‘(")(logw-klogn)Af^/*("^t(n)logn)-k 

=> C{N) = L{N) * 0(iV^+^/*^”^t(n) logn(logrc -k logn)) . 

The setting for r guarantees linear space consumption. □ 

The described static dictionary is dynamized naturally because data is stored 
in a tree structure. 

Theorem 4. Let t : IN i— >■ IN &e a nondecreasing function computable in time 
and space 0(n), and 4 < t{n) < O ( \/log n / log log n) . There exists a linear space 
dynamic deterministic dictionary with query time 0(t(n) log logn) and update 
time 



Q(n5/*(")i2(n) log n log log n( log w -k logn)) . 

The bound for lookup time is worst case, while the bound for updates time is 
amortized. 

We omitted ceilings for clearity. 
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Proof. Standard rebuilding scheme of varying locality is used. Rebuilding initi- 
ated at some node causes the reconstruction of the whole subtree rooted at that 
node. The local reconstruction of a subtree is performed after the number of 
updates reaches the half of the number of elements present in that subtree on 
the last rebuilding. Assume that only the insertions are performed; this case is 
worse than the case of mixed insertions and deletions because the size changes 
faster. Also assume that new elements are “pumped” along the same path of the 
global tree. It will be clear that this is the worst case. 

Suppose that the number of elements in a subtree was N after the last 
rebuilding which had occurred at some higher level node - an ancestor of 
the subtree’s root. We want to bound the total time spent on rebuildings of 
the subtree between rebuildings initiated at higher level nodes. The settings 
of the construction parameters {s,r,m) depend on the value of n. In the dy- 
namic case, n is the total number of elements on the last global rebuilding. Let 
Kn = t^{n) log n log log n ( log w -I- logn) and t = t{n). The sum of times for the 
reconstructions of the whole subtree is 





r,8 1 


l+2/t 




o 




Kn + 






2 




[V2y J 



l+2/t 



Kr, 




(8) 



where p is such that (f N = + N; the right side represents the 

number of updates, and is the number of elements contained in the tree 

rooted at the direct ancestor node of the subtree we observe. We used equal sign 
in the definition of p because, for easier proof, we assume that p is an integer. 
The assumption does not affect the asymptotic bounds. Putting q = 

(8) becomes 



0(N^+VtKn{q + q^ + ... + qn) 

= O = O . (9) 

From the definitions of p and q we get 

/ 1 \ l+2/t 

gP+i= f iArV(*-i) + lj =0{NW^) . 

Using t{n) > 4 and substituting into (9) gives a bound ATji). In order 

to find the complete sum of times, we must include reconstructions initiated 
at lower level nodes descending from the subtree’s root. Let R{N) denote the 
desired time. As we said, N is the size of the subtree at the beginning of the 
phase which ends at a higher level reconstruction. The initial size of a bucket in 
the root node is but this value grows after each reconstruction of the 
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root. Thus, the following recurrence holds: 

R{N) = + R 



+R 





l-l/tN 



+ . . . 



The last term in the sum is less than R{N/S). An induction argument proves 
that R{N) = Kn). The total time for n/2 insertions includes a global 

rebuilding: 

//o \i+2/d«) \ 

+ o f f -nj Kn j = . 

□ 
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Abstract. Let A and B two nxn matrices over a ring R (e.g., the reals 
or the integers) each containing at most m non-zero elements. We present 
a new algorithm that multiplies A and B using 0{mP ’^ v}'^ + 
algebraic operations (i.e., multiplications, additions and subtractions) 
over R. For m < the new algorithm performs an almost optimal 

number of only operations. For m < the new algorithm is 

also faster than the best known matrix multiplication algorithm for dense 
matrices which uses 0(n^ ®®) algebraic operations. The new algorithm is 
obtained using a surprisingly straightforward combination of a simple 
combinatorial idea and existing fast rectangular matrix multiplication 
algorithms. We also obtain improved algorithms for the multiplication 
of more than two sparse matrices. As the known fast rectangular matrix 
multiplication algorithms are far from being practical, our result, at least 
for now, is only of theoretical value. 



1 Introduction 

The multiplication of two nxn matrices is one of the most basic algebraic prob- 
lems and considerable effort was devoted to obtaining efficient algorithms for 
the task. The naive matrix multiplication algorithm performs 0{n^) operations. 
Strassen [23] was the first to show that the naive algorithm is not optimal, giving 
an algorithm for the problem. Many improvements then followed. The 

currently fastest matrix multiplication algorithm, with a complexity of 0(n^'^®), 
was obtained by Coppersmith and Winograd [8] . More information on the fasci- 
nating subject of matrix multiplication algorithms and its history can be found 
in Pan [16] and Biirgisser et al. [3]. An interesting new group theoretic approach 
to the matrix multiplication problem was recently suggested by Cohn and Umans 
[6]. For the best available lower bounds see Shpilka [22] and Raz [18]. 

Matrix multiplication has numerous applications in combinatorial optimization 
in general, and in graph algorithms in particular. Fast matrix multiplication 
algorithms can be used, for example, to obtain fast algorithms for finding simple 
cycles in graphs [1,2,24], for finding small cliques and other small subgraphs [15], 
for finding shortest paths [20,21,25], for obtaining improved dynamic reachability 
algorithms [9,19], and for matching problems [14,17,5]. Other applications can 
be found in [4,13], and this list is not exhaustive. 
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In many cases, the matrices to be multiplied are sparse, i.e., the number of 
non-zero elements in them is negligible compared to the number of zeros in 
them. For example li G = {V, E) \s s. directed graph on n vertices containing 
m edges, then its adjacency matrix Aq is an n x n matrix with m non-zero 
elements (I’s in this case). In many interesting cases m = o(n^). Unfortunately, 
the fast matrix multiplication algorithms mentioned above cannot utilize the 
sparsity of the matrices multiplied. The complexity of the algorithm of [8], for 
example, remains even if the multiplied matrices are extremely sparse. 

The naive matrix multiplication algorithm, on the other hand, can be used to 
multiply two n x n matrices, each with at most m non-zero elements, using 
0{mn) operations (see next section). Thus, for m = the sophisticated 

matrix multiplication algorithms do not provide any improvement over the naive 
matrix multiplication algorithm. 

In this paper we show that the sophisticated matrix multiplication algorithms 
can nevertheless be used to speed-up the computation of the product of even 
extremely sparse matrices. More specifically, we present a new algorithm that 
multiplies two n x n matrices, each with at most m non-zero elements, using 
0{mP '^n^ ‘^ + algebraic operations. (The exponents 0.7 and 1.2 are de- 

rived, of course, from the current 2.38 bound on the exponent of matrix multipli- 
cation, and from bounds on other exponents related to matrix multiplications, 
as will be explained in the sequel.) There are three important things to notice: 
(z) If m > for any e > 0, then the number of operations performed by the 
new algorithm is o{mn), i.e., less then the number of operations performed, 
in the worst-case, by the naive algorithm. 

(ii) If m < then the new algorithm performs only operations. This 

is very close to optimal as all entries in the product may be non-zero, 
even if the multiplied matrices are very sparse. 

(Hi) If TO < then the new algorithm performs only o(n^'®®), i.e., fewer 

operations than the fastest known matrix multiplication algorithm. 

In other words, the new algorithm improves on the naive algorithm even for 
extremely sparse matrices (i.e., to = and it improves on the fastest matrix 

multiplication algorithm even for relatively dense matrices (i.e., to = n^ ®®). 

The new algorithm is obtained using a surprisingly straightforward combina- 
tion of a simple combinatorial idea, implicit in Eisenbrand and Grandoni [10] 
and Yuster and Zwick [24], with the fast matrix multiplication algorithm of 
Coppersmith and Winograd [8], and the fast rectangular matrix multiplication 
algorithm of Coppersmith [7]. It is interesting to note that a fast rectangular ma- 
trix multiplication algorithm for dense matrices is used to obtain a fast matrix 
multiplication algorithm for sparse square matrices. 

As mentioned above, matrix multiplication algorithms are used to obtain fast al- 
gorithms for many different graph problems. We note (with some regret . . . ) that 
our improved sparse matrix multiplication algorithm does not yield, automati- 
cally, improved algorithms for these problems on sparse graphs. These algorithms 
may need to multiply dense matrices even if the input graph is sparse. Consider 
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for example the computation of the transitive closure of a graph by repeatedly 
squaring its adjacency matrix. The matrix obtain after the first squaring may 
already be extremely dense. Still, we expect to find many situations in which 
the new algorithm presented here could be useful. 

In view of the above remark, we also consider the problem of computing the 
product A 1 A 2 ■ ■ ■ Ak of three or more sparse matrices. As the product of even 
very sparse matrices can be completely dense, the new algorithm for multiplying 
two matrices cannot be applied directly in this case. We show, however, that 
some improved bounds may also be obtained in this case. Our results here are 
less impressive, however. For /c = 3, we improve, for certain densities, on the 
performance of all existing algorithms. For fc > 4 we get no worst-case improve- 
ments at the moment, but such improvements will be obtained if bounds on 
certain matrix multiplication exponents are sufficiently improved. 

The rest of the paper is organized as follows. In the next section we review 
the existing matrix multiplication algorithms. In Section 3 we present the main 
result of this paper, i.e., the improved sparse matrix multiplication algorithm. 
In Section 4 we use similar ideas to obtain an improved algorithm for the mul- 
tiplication of three or more sparse matrices. We end, in Section 5, with some 
concluding remarks and open problems. 



2 Existing Matrix Multiplication Algorithms 

In this short section we examine the worst-case behavior of the naive matrix 
multiplication algorithm and state the performance of existing fast matrix mul- 
tiplication algorithms. 



2.1 The Naive Matrix Multiplication Algorithm 

Let A and B be two n x n matrices. The product C = AB is defined as follows 
Cij = X)fc=i '^ikbkj, for 1 < *, j < n. The naive matrix multiplication algorithm 
uses this definition to compute the entries of C using multiplications and 
rt' — additions. The number of operations can be reduced by avoiding the 
computation of products Uikbkj for which aik = 0 or bkj = 0. In general, if 
we let ak be the number of non-zero elements in the A:-th column of A, and bk 
be the number of non-zero elements in the A:-th row of B, then the number of 
multiplications that need to be performed is only The number of 

additions required is always bounded by the required number of multiplications. 
This simple sparse matrix multiplication algorithm may be considered folklore. 
It can also be found in Gustavson [11]. 

If A contains at most m non-zero entries, then X)fe=i b-kbk < bk)n < ran. 

The same bound is obtained when B contains at most m non-zero entries. 
Can we get an improved bound on the worst-case number of products required 
when both A and B are sparse? Unfortunately, the answer is no. Assume that 
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m > n and consider the case hi = bi = n, if * < m/n, and Qi = bi = 0, 
otherwise. (In other words, all non-zero elements of A and B are concentrated 
in the first m/n columns of A and the first m/n columns of B.) In this case 
Sfc=i b-kbk = {m/n) ■ n^ = mn. Thus, the naive algorithm may have to perform 
mn multiplications even if both matrices are sparse. It is instructive to note that 
the computation of AB in this worst-case example can be reduced to the com- 
putation of a much smaller rectangular product. This illustrates the main idea 
behind the new algorithm: When the naive algorithm performs many operations, 
rectangular matrix multiplication can be used to speed up the computation. 

To do justice with the naive matrix multiplication algorithm we should note 
that in many cases that appear in practice the matrices to be multiplied have a 
special structure, and the number of operations required may be much smaller 
than mn. For example, if the non-zero elements of A are evenly distributed 
among the columns of A, and the non-zero elements of B are evenly distributed 
among the rows of B, we have a,k = bk = m/n, for 1 < k < n, and J2k=i b^kbk = 
n ■ {m/n)"^ = m? /n. We are interested here, however, in worst-case bounds that 
hold for any placement of non-zero elements in the input matrices. 

2.2 Fast Matrix Multiplication Algorithms for Dense Matrices 

Let M{a, b, c) be the minimal number of algebraic operations needed to multiply 
an a X b matrix by a 6 x c matrix over an arbitrary ring R. Let uj{r, s,t) be the 
minimal exponent u> for which M{n^,n^,n*) = We are interested 

here mainly in cj = w(l, 1, 1), the exponent of square matrix multiplication, and 
w(l, r, 1), the exponent of rectangular matrix multiplication of a particular form. 
The best bounds available on uj{l,r,l), for 0 < r < 1 are summarized in the 
following theorems: 

Theorem 1 (Coppersmith and Winograd [8]). w < 2.376 . 

We next define the constants a and (3 of rectangular matrix multiplication. 

Definition 1. a = max{ 0 < r < 1 | o;(l,r, 1) = 2 } , (3 = ^ . 



Theorem 2 (Coppersmith [7]). a > 0.294 . 

It is not difficult to see that these Theorems 1 and 2 imply the following theorem. 
A proof can be found, for example, in Huang and Pan [12]. 



Corollary 1. M{n,£,n) < n^ “/3+°(i)£/3 _|_ f^ 2 -i-o(i) ^ 

All the bounds in the rest of the paper will be expressed terms of a and (3. Note 
that with w = 2.376 and a = 0.294 we get [3 ~ 0.533. If w = 2, as conjectured 
by many, then a = 1. (In this case (3 is not defined, but also not needed.) 



Theorem 3. 
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3 The New Sparse Matrix Multiplication Algorithm 

Let A^k be the k-th column of A, and let Bkr be the /c-th row of B, for 1 < fc < n. 
Clearly AB = A^kBk*- (Note that A^k is a column vector, Bi~* a row vector, 
and A^kBk* is an n x n matrix.) Let be the number of non-zero elements 
in A^k E^nd let be the number of non-zero elements in (For brevity, we 
omit the bars over ak and bk used in Section 2.1. No confusion will arise here.) 
As explained in Section 2.1, we can naively compute AB using 0{J2k'^kbk) 
operations. If A and B each contain m non-zero elements, then akbk may be 
as high as mn. (See the example in Section 2.1.) 

For any subset I C [n] let A*/ be the submatrix composed of the columns of A 
whose indices are in / and let Bj^ the submatrix composed of the rows of B whose 
indices are in /. If J = \n]—I, then we clearly have AB = A^iBii.+A^jBj^. Note 
that Ai^iBiif and A^jBj^ are both rectangular matrix multiplications. Recall 
that M(n,(.,n) is the cost of multiplying an n x ^ matrix by an ^ x n matrix 
using the fastest rectangular matrix multiplication algorithm. 

Let 7T be a permutation for which a,r(i)^ 7 r(i) > a 7 r( 2 )^ 7 r( 2 ) > > o-K(n)b-K(n)- 

A permutation tt satisfying this requirement can be easily found in 0(n) time 
using radix sort. The algorithm chooses a value 1 < £ < n, in a way that will 
be specified shortly, and sets / = {7 t(1 ), . . . , and J = {tt{£ -|- 1), . . . , 7r(n)}. 
The product A^jBj^ is then computed using the fastest available rectangular 
matrix multiplication algorithm, using M{n,£,n) operations, while the prod- 
uct A^jBj^ is computed naively using 0(J2k>e'^Trik)b-7r{k)) operations. The two 
matrices A^iBi^^ and A^^jBj^^ are added using 0{in?) operations. We naturally 
choose the value £ that minimizes M{n,£,n)+J2k>e'^Tr(k)bTr{k))- (This can easily 
be done in 0{n) time by simply checking all possible values.) The resulting algo- 
rithm, which we call SMP (Sparse Matrix Multiplication), is given in Figure 1. 
We now claim: 

Theorem 4. Algorithm SMP{A, B) computes the product of two nxn matrices 
over a ring R, with mi and m 2 non-zero elements respectively, using at most 

0(min{ (miTO 2 )^^ n 1 ^+^ +°(i) _|_ ^ min , m 2 n , }) 

ring operations. 



20 2-a0 I 

When mi = m 2 = m, the first term in the bound above is n ^ 3+1 

It is easy to check that for m = 0(n^+t ), the number of operations performed 
by the algorithm is only ^ jt is also not difficult to check that for m = 

0{n^ '^), for any e > 0, the algorithm performs only o{n^) operations. Using 

the currently best available bounds on w, a and /3, namely w ~ 2.376, a ~ 0.294, 
and (3 ~ 0.536, we get that the number of operations performed by the algorithm 
is at most 0{mA '^ -\-n^'^A^)f justifying the claims made in the abstract and 
the introduction. 

The proof of Theorem 4 relies on the following simple lemma: 
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Algorithm SMP{A, B) 

Input: Two n x n matrices A and B. 

Output: The product AB. 

1. Let ak be the number of non-zero elements in A^k, for 1 < k < n. 

2. Let bk be the number of non-zero elements in Bk*, for 1 < fc < n. 

3. Let 7 T be a permutation for which a,r(i) 67 r(i) > ■ ■ ■ > ciTr(n)bn(n)- 

4. Find an 0 < ^ < n that minimizes M{n,£,n) -F , aTrffclfc-n-Osi ■ 

5. Let I = , 7 t(^)} and J = {-k{£ -|- 1), . . . , 7 r(n)}. 

6 . Compute Ci A^iBit, using a fast dense rectangular matrix mul- 
tiplication algorithm. 

7. Compute C 2 t— AtjBj^ using the naive sparse matrix multiplication 
algorithm. 

8 . Output Cl -I- C 2 . 



Fig. 1. The new fast sparse matrix mnltiplication algorithm. 



Lemma 1. For any 1 < £ < n we have ■ 

Proof. Assume, without loss of generality, that ci > 02 > . . . > a„. Let 1 
i < n. We then clearly have J2k>e°'Tr(k)bTT{k) < J2k>e^kbk- Also iae+i 
J2k<i — ^1- Thus < toi/£. Putting this together we get 

'^a^ik)K(k) < '^akbk < ae+i'^bk < ^ m2 = , 

k>l k>e k>t 

We are now ready for the proof of Theorem 4. 



Proof, (of Theorem 4) We first examine the two extreme possible choices of t. 
When £ = Q, the naive matrix multiplication algorithm is used. When £ = n, 
a fast dense square matrix multiplication algorithm is used. As the algorithm 
chooses the value of £ that minimized the cost, it is clear that the number of 
operations performed by the algorithm is 0(min{ m\n , m 2 U , }). 

All that remains, therefore, is to show that the number of operations performed 
by the algorithm is also 0((mim2)^+i n '5+J- -p If mim 2 < 

let £ = m\m 2 /r? . As £ < n°‘, we have, 

M(n,£,n) -F^a^(fc)6,r(fc) < ^ „ 2 +o(i) _ 

k>t 

If mim2 > let £ = (77117712) . It is easy to verify that £ > 77“, and 

therefore 



M(n, £, n) 



2-a/3+o(l)^/3 ^ (mi77l2)^n^+°(^^ 



VI VI 
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^ ^ ^7r(fc)^-7r(fc) — 

k>e 



mim 2 

i 






(3 2 - 0/3 

{rnim2)i^+^n . 



Thus, M(n,£, n) + = (mim 2 )^n /3+i^+°(^\ As the algorithm 

chooses the value of (. that minimizes the number of operations, this completes 
the proof of the theorem. □ 



4 Multiplying Three or More Matrices 



In this section we extend the results of the previous section to the product of 
three or more matrices. Let Ai, A 2 , . . . , Ak be n x n matrices, and let rur, for 
1 < r < fc be the number of non-zero elements in A^. Let B = Ai A 2 • • • be the 
product of the k matrices. As the product of two sparse matrices is not necessarily 
sparse, we cannot use the algorithm of the previous section directly to efficiently 
compute the product of more than two sparse matrices. Nevertheless, we show 
that the algorithm of the previous section can be generalized to efficiently handle 
the product of more than two matrices. 

Let Ar = for 1 < r < fc, and B = A 1 A 2 • • • Afc = (6^). It follows easily 

from the definition of matrix multiplication that 



hj — 



E 

ri,r2,...,rk-i 



i,ri ^ri ,r2 



^rk-2rk-i^rk-ij ' 



( 1 ) 



It is convenient to interpret the computation of b^j as the summation over paths 
in a layered graph, as shown (for the case fc = 4) in Figure 2. More precisely, the 
layered graph corresponding to the product A 1 A 2 • • • Afc is composed of fc -|- 1 
layers lA, V 2 , . . . , Vfc+i. Each layer Vr, where l<r<fc-|-lis composed of n 
vertices Vr,i, for 1 < i < n. For each non-zero element in A^, there is an edge 
Vr,i in the graph labelled by the element a\y The element bij of the 

product is then the sum over all directed paths in the graph from vi^i to v^+ij 
of the product of the elements labelling the edges of the path. 

Algorithm SCMP (Sparse Chain Matrix Multiplication) given in Figure 3 is 
a generalization of the variant of algorithm SMP given in the previous sec- 
tion for the product of two matrices. The algorithm starts by setting i to 
iUl nir) ''-1+/3 , where nir in the number of non-zero entries in A^, for 

1 q/3-2 

1 < r < fc. (Note that when fc = 2, we have £ = { 17111712 ) n / 3+1 , as in the 
proof of Theorem 4.) Next, the algorithm lets A be the set of indices of the £ 
rows of Ar with the largest number of non-zero elements, ties broken arbitrarily, 
for 2 < r < fc. It also lets Jr = [n] — A be the set of indices of the n — £ rows of Ar 
with the smallest number of non-zero elements. The rows of Ar with indices in A 
are said to be the heavy rows of Ar, while the rows of A^ with indices in Jr are 
said to be light rows. The algorithm is then ready to do some calculations. For 
every 1 < r < fc, it computes Pr ^ (Ai)^j^ (A 2 ) • • • (A^)j^^. This is done by 
enumerating all the corresponding paths in the layered graph corresponding to 
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Fig. 2. A layered graph corresponding to the product A1A2A3A4. 



the product. The matrix Pr is an nx n matrix that gives the contribution of the 
light paths, i.e., paths that do not use elements from heavy rows of A2 , . . . , 
to the prefix product A1A2 • • ■ A^.. Next, the algorithm computes the suffix prod- 
ucts Sr ^ Ar' ■ ■ A}., for 2 < r < k, using recursive calls to the algorithm. The 
cost of these recursive calls, as we shall see, will be overwhelmed by the other 
operations performed by the algorithm. The crucial step of the algorithm is the 
computation of Br ^ {Pr-i)*ir{Sr)ir*j for 2 < r < k, using the fastest available 
rectangular matrix multiplication algorithm. The algorithm then computes and 
outputs the matrix (X)r=2 ^r) + Pk- 

Theorem 5 . Let A\, A2, ■ ■ ■ , Ak be n x n matrices each with mi, m2, ■ ■ ■ , m^ 
non-zero elements, respectively, where k > 2 is a constant. Then, algorithm 
SCMP{Ai, A2, . . . , Ak) correctly computes the product A1A2 ■ ■ ■ Ak using 

O ((nti + n 2 +°(i)^ 

algebraic operations. 



Proof. It is easy to see that the outdegree of a vertex Vrj is the number of non- 
zero elements in the j-th. row of Ar. We say that a vertex Vrj is heavy if j £ 1 ^, 
and light otherwise. (Note that vertices of t 4 +i are not classified as light or 
heavy. The classification of V\ vertices is not used below.) A path in the layered 
graph is said to be light if all its intermediate vertices are light, and heavy if at 
least one of its intermediate vertices is heavy. 

Let f*® one of the terms appearing in the sum of fey- 

given in ( 1 ). To prove the correctness of the algorithm we show that this term 
appears in exactly one of the matrices B2, ... ,Bk and Pk which are added up to 
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Algorithm SCMP{Ai, A 2 , . . . , A^) 

Input: n X n matrices Ai, A 2 , ■ ■ ■ , Ak- 
Output: The product A 1 A 2 ■ ■ ■ At- 



1. Let 



2. Let Ir be the set of indices of I rows of Ar with the largest number 
of non-zero elements, and let Jr = [n] — Ir, for 2 < r < k. 

3. Compute Pr (Ai)^j^{A 2 ) ■ ■ ■ {Ar)j^^ by enumerating all 
corresponding paths, for 1 < r < A:. 

4. Compute Sr Ar ■■ ■ Ak, for 2 < r < k, using recursive calls to 
the algorithm. 

5. Compute Br {P, — i)*ir{Sr)ir* using the fastest available rect- 

angular matrix multiplication algorithm, for 2 < r < k. 



produce the matrix returned by the algorithm. Indeed, if the path corresponding 
to the term is light, then the term appears in Pk- Otherwise, let Vrj^ be the first 
heavy vertex appearing on the path. The term then appears in Br and in no 
other product. This completes the correctness proof. 

We next consider the complexity of the algorithm. As mentioned, the outdegree 
of a vertex Vrj is equal to the number of non-zero elements in the j-th row of Ar- 
The total number of non-zero elements in Ar is m^. Let dr be the maximum 
outdegree of a light vertex of Vr- The outdegree of every heavy vertex of K is 
then at least dr- As there are £ heavy vertices, it follows that dr < rrirlt, for 
2 < r < k. 

The most time consuming operations performed by the algorithm are the com- 
putations of 



by explicitly going over all light paths in the layered graph, and the k — 1 
rectangular products 



The number of light paths in the graph is at most mi • (^2^3 • • • dfc. Using the 
bounds we obtained on the dr’s, and the choice of £ we get that the number of 
light paths is at most 



6. Output iJ2r=2 



Fig. 3. Computing the product of several sparse matrices. 



Pk ^ j^j^ - ■ ■ {Ak) 



Br ^ {Pr-l)*Ir{Sr)lr* i for 2 < r < fc . 



mi-d2d^---dk < (O' 



■k 

r—1 
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Thus, the time taken to compute Pk is 0((n n '“-1+/3 ). (Com- 

puting the product of the elements along a path requires k operations, but we 
consider fc to be a constant.) 

As \Ir\ = £, for 2 < r < k, the product {Pr-i)*ir{Sr)ir* is the product of an 
nx £ matrix by an ^ x n matrix whose cost is M{n,£,n). Using Corollary 1 and 
the choice of £ made by the algorithm we get that 



M{n,£,n) = n? £^ 

§_ 



^ 2-a/3+o{l) 









P+o(l) 



= +^ 2 + 0 ( 1 ) 

= (rir=l"*r) 



^U(^+°(l)+„2+o(l) 



Finally, it is easy to see that the cost of computing the suffix products Sr ■‘r- 
Ar ■ ■ ■ Afc, for 2 < r < k, using recursive calls to the algorithm, is dominated 
by the cost of the other operations performed by the algorithm. (Recall again 
that fc is a constant.) This completes the proof of the theorem. □ 



There are two alternatives to the use of algorithm SCMP for computing the 
product A 1 A 2 ■ ■ ■ Ak- The first it to ignore the sparsity of the matrices and 
multiply the matrices in 0{n^) time. The second is to multiply the matrices, 
one by one, using the naive algorithm. As the naive algorithm uses at most 
0{mn) operations when one of the matrices contains only m non-zero elements, 
the total number of operations in this case will be 0((X)i=r- ^r)n). For simplicity, 
let us consider the case in which each one of the matrices Ai , . . . , contains 
m non-zero elements. A simple calculation then shows that SCMP is faster then 
the fast dense matrix multiplication algorithm for 

k-l+uj 

m < n , 

and that it is faster than the naive matrix multiplication algorithm for 
m > max{n ('“-iKi-/’) , ^ . 

For k = 2, these bounds coincide with the bounds obtained in Section 3. For 
k = 3, with the best available bounds on w, a and /3, we get that SCMP is the 
fastest algorithm when For smaller values of m the naive 

algorithm is the fastest, while for larger values of m the fast dense algorithm 
is the fastest. Sadly, for fc > 4, with the current values of ui, a and /3, the new 
algorithm never improves on both the naive and the dense algorithms. But, this 
may change if improved bounds on to, and especially on a, are obtained. 



5 Concluding Remarks and Open Problems 

We obtained an improved algorithm for the multiplication of two sparse matri- 
ces. The algorithm does not rely on any specific structure of the matrices to be 
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multiplies, just on the fact that they are sparse. The algorithm essentially parti- 
tions the matrices to be multiplied into a dense part and a sparse part and uses 
a fast algebraic algorithm to multiply the dense parts, and the naive algorithm 
to multiply the sparse parts. We also discussed the possibility of extending the 
ideas to the product of fc > 3 matrices. For fc = 3 we obtained some improved re- 
sults. The new algorithms were presented for square matrices. It is not difficult, 
however, to extend them to work on rectangular matrices. 

The most interesting open problem is whether it is possible to speed up the 
running time of other operations on sparse matrices. In particular, is it possible 
to compute the transitive closure of a directed n-vertex m-edge graph in o{mn) 
time? Is there an algorithm for the problem, for some e > 0? 
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Abstract. The size of the Pareto curve for the hicriteria version of the knapsack 
problem is polynomial on average. This has heen shown for various random input 
distributions. We experimentally investigate the number of Pareto optimal knap- 
sack fillings. Our experiments suggests that the theoretically proven upper hound 
of O(n^) for uniform instances and for general prohahility distributions 

is not tight. Instead we conjecture an upper bound of matching a lower 

hound for adversarial weights. 

In the second part we study advanced algorithmic techniques for the knapsack 
problem. We combine several ideas that have heen used in theoretical studies to 
bound the average-case complexity of the knapsack problem. The concepts used 
are simple and have been known since at least 20 years, but apparently have not 
heen used together. The result is a very competitive code that outperforms the best 
known implementation Combo by orders of magnitude also for harder random 
knapsack instances. 



1 Introduction 



Given n items with positive weights , . . . , and profits , . . . ,Pn and a knapsack 
capacity c, the knapsack problem asks for a subset S C [n] := {1,2, .. . ,n} such that 
Thi^sPi maximized. Starting with the pioneering work of Dantzig 
[4], the problem has been studied extensively in practice (e.g. [13,14,17]) as well as in 
theory (e.g. [12,9,5,2,3]). For a comprehensive treatment we refer to the recent mono- 
graph by Kellerer, Pferschy and Pisinger [11] which is entirely devoted to the knapsack 
problem and its variations. Despite the NP-hardness of the problem [8], several large 
scale instances can be solved to optimality very efficiently. In particular, randomly gen- 
erated instances seem to be quite easy to solve. As most exact algorithms rely on an 
partial enumeration of knapsack fillings, pruning the search space is an important is- 
sue for efficiency. A standard approach is the domination concept [18]. Consider the 
bicriteria version of the knapsack problem. Instead of a strict bound on the weight of 
knapsack fillings, assume that the two objectives (weight and profit) are to be optimized 
simultaneously, i.e., we aim at minimizing weight and maximizing profit. The set of 
solutions is then S = . 

* This work was supported in part by DFG grant Vo889/l-l. 
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Pareto optimal solutions. A solution dominates another solution if it is better or equally 
good in all objectives and strictly better in at least one objective. A solution S is called 
Pareto optimal or undominated, if there is no other solution that dominates S. Let 
w{S) = denote the weight and profit of solution S € 

S. The set of pairs {(ry(5'),p(S')) | S € 5 is Pareto optimal} is called Pareto points 
or Pareto curve of S. These definitions translate to the more general framework of 
multiobjective combinatorial optimization [6], where the notion of Pareto optimality 
plays an crucial roll. In particular, the problem of enumerating all Pareto optimal solutions 
is of major interest and has led to a long list of publications, compiled in [7]. For the 
knapsack problem, however, enumerating Pareto optimal solutions is easy, due to the 
simple combinatorial structure of the problem. 

Observe that enumerating all Pareto optimal knapsack fillings essentially solves the 
knapsack problem as the most profitable solution that satisfies the capacity bound c is an 
optimal solution. The reason is that every feasible dominated solution cannot be better 
than the solution it is dominated by, which, by definition, is also feasible. Notice that 
the number of Pareto points and Pareto optimal solutions can differ heavily since many 
Pareto optimal solutions can be mapped to the same Pareto point. We will sometimes 
blur this distinction for the sake of a short presentation. 



The Nemhauser/Ullmann algorithm. In 1969, Nemhauser and Ullmann [16] introduce 
the following elegant algorithm that computes a list of all Pareto points in an iterative 
manner. In the literature (e.g. [11]) the algorithm is sometimes referred as “Dynamic 
programming with lists”. For i € [n], let Li be the list of all Pareto points over the solution 
setiSi = 2^, i.e., Si contains all subsets of the items 1, . . . ,i. Recall that each Pareto point 
is a {weight, profit) pair. The points in Li are assumed to be listed in increasing order of 
their weights. Clearly, Li = ((0,0), (wi,pi)). The list can be computed from Li 
as follows: Create a new ordered list L[ by duplicating Li and adding {wi+i,pi+i) to 
each point. Now we merge the two lists into Li+i obeying the weight order of subsets. 
Finally, those solutions are removed from the list that are dominated by other solutions 
in the list. The list Li can be calculated from the list Li-\ in time that is linear in the 
length of Li-\. Define q{i) = \Li\. Thus, the Nemhauser/Ullmann algorithm computes 
the Pareto curve in time 0(X)r=i ?(* ~ ^))- 

In order to extract the knapsack fillings corresponding to the computed Pareto points, 
we maintain a bit for each Pareto point in Li indicating whether or not item i is contained 
in the corresponding knapsack filling. The knapsack filling can then reconstructed re- 
cursively using the standard technique from dynamic programming (see e.g. [11]). In a 
recent theoretical study, the average number of Pareto points for the knapsack problem 
has been proved to be polynomial under various random input distributions. 

Theorem 1. ([2]) Suppose the weights are arbitrary positive numbers. Let q denote 
the number of Pareto points over all n items and let T denote the running time of the 
Nemhauser/Ullmann algorithm. 

— If profits are chosen according to the uniform distribution over [0,1] then E[q] = 
0{nf) and E[T] = O(n^). 
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- Foreveryi € [n], letprofitpi be anon-negative random variable with density function 
fi : R>o R>o- Suppose p = maxig[„] (E[pi]) and f = maxjg[„] 

Then E [q] = 0{4>pri^) and E [g] = 0{<j)prf'). 



In fact, the proof technique can easily be adapted to show a more general theorem 
[1] which bounds the expected number of Pareto points over an arbitrary collection of 
subsets of n elements (items) and which holds for any optimization direction for the 
two objectives. This way, it applies to general bicrlteria problems with linear objective 
functions for which the solution set of an instance can be described by a collection of 
subsets over a ground set. As an example, consider the bicriteria shortest path problem. 
The ground set is the set of edges and solutions are elementary s, f-paths, hence, subsets 
of edges. This generalized theorem has been applied to a simple algorithm enumerating 
all Pareto optimal s,f-paths and, thus, led to the first average-case analysis [1] for the 
constrained shortest path problem [15]. We believe that the generalized version of the 
theorem hold much potential of improving and simplifying probabilistic analyses for 
other combinatorial bicriteria problems and their variations. 

An interesting question concerns the tightness of the upper bounds. There exists a 
lower bound of n^/4 for non-decreasing density functions for knapsack instances using 
exponentially growing weights [2]. This leaves a gap of order n for the uniform distri- 
bution and order nf for general distributions. In the first part of this experimental study 
we investigate the number of Pareto points under various random input distributions. 
Even though the experiments are limited to knapsack instances and the generality of the 
upper bounds (adversarial weights, arbitrary probability distributions, different distri- 
butions for different items) hardly allows a qualitative confirmation of the bounds, we 
found some evidence that the lower bound is tight, which in turn opens the chance of 
improving the upper bounds. 

In the second part we study the influence of various algorithmic concepts on the 
efficiency of knapsack algorithms. A combination of all considered ideas led to a very 
competitive code. Interestingly, all concepts used are long known (at least since 1984) 
but apparently have not been used together in an implementation. 

The most influential algorithmic idea is the core concept. Core algorithms make use 
of the fact that the optimal integral solution is usually close to the optimal fractional one 
in the sense that only few items need to be exchanged in order to transform one into 
the other. Obtaining an optimal fractional solution is computationally very inexpensive. 
Based on this solution, a subset of items, called the core, is selected. The core problem 
is to find an optimal solution under the restriction that items outside the core are fixed 
to the value prescribed by the optimal fractional solution. The hope is that a small 
core is sufficient to obtain an optimal solution for the unrestricted problem. The core 
problem itself is again a knapsack problem with a different capacity bound and a subset 
of the original items. Intuitively, the core should contain those items for which it is 
hard to decide whether or not they are part of an optimal knapsack filling. The core 
concept leaves much space for variations and heuristics, most notably, the selection of 
the core items. A common approach selects items whose profit-to-weight ratio is closest 
to the profit-to-weight ratio of the break item [11]. We, however, follow the definition of 
Goldberg and Marchetti-Spaccamela [9], who introduce the notion of the loss of items. 
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Fig. 1. Items correspond to points in the 
xy-plane. The ray starting from the origin 
through the break item is called Dantzig ray. 
Items above the Dantzig ray are included in 
the break solution, items below the Dantzig 
ray are excluded. The loss of an item is the 
vertical distance to the Dantzig ray. The core 
region (shaded area) is a stripe around the ray. 
It grows dynamically until the vertical thick- 
ness has reached the size of the integrality 
gap r. Items outside the core are fixed to the 
value prescribed by the break solution. 




Let us first describe the greedy algorithm for computing an optimal fractional so- 
lution. Starting with the empty knapsack, we add items one by one in order of non- 
increasing proht-to-weight ratio. The algorithm stops when it encounters the first item 
that would exceed the capacity bound c. This item is called break item and the com- 
puted knapsack filling, not including the break item, is called the break solution. Fi- 
nally, we fill the residual capacity with an appropriate fraction / < 1 of the break item. 
Let X = {xi,.. . ,Xn) be the binary solution vector of this solution (or any other op- 
timal fractional solution). For any feasible integer solution X = (a;i,... ,x„), define 
ch{X) = {i G [n] : Xi ^ Xi}, i.e., ch{X) is the set of items on which X and X do not 
agree. When removing an item from X, the freed capacity can be used to include other 
items which have profit-to-weight ratio at most that of the break item. A different profit- 
to- weight ratio of the exchanged items causes a loss in profit that can only be compensated 
by better utilizing the knapsack capacity. This loss can be quantified as follows: 

Definition 1. Let r denote the profit-to-weight ratio of the break item. Define the loss of 
item i to be lossii) = \pi — rwi\. 

Goldberg and Marchetti-Spaccamela [9] proved that the difference in profit between 
any feasible integer solution X = (a;i,. . . ,a;„) and the optimal fractional solution X 
can be expressed as 

p{X)-p{X) = ric-'^XiWi\+ ^ loss(i) . 

\ ^—1 / iGch{X) 

The first term on the right hand side corresponds to an unused capacity of the integer 
solution X whereas the second term is due to the accumulated loss of all items in 
ch{X). Let X* denote an arbitrary optimal integral solution. The integrality gap 
r = p(X) — p(X*) gives an upper bound for the accumulated loss of all items in 
ch{X*), and therefore, also for the loss of each individual item in ch{X*). 

Fact 1. For any optimal integer solution X* it holds "Yhi^chix*) loss{i) < F. 

Thus, items with loss larger than F must agree in X and X* . Consequently, we define 
the core to consist of all elements with loss at most F. Since we do not know F, we 
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use a dynamic core, adding items one by one in order of increasing loss and compute 
a new optimal solution whenever we extend the core. We stop when the next item has 
loss larger than the current integrality gap, that is, the gap in profit between X and the 
best integer solution found so far. Figure 1 illustrates the definitions of the core. 

Loss filter. Goldherg and Marchetti-Spaccamela further exploit Fact 1. Observe that 
solving the core problem is equivalent to finding the set ch{X*) for some optimal 
solution X*. Let Ib denote the set of core items included in the break solution. We 
solve a core problem relative to the break solution, that is, including an item from Ib 
into a core solution corresponds to removing it from the break solution. Hence, in the 
core problem, items in have negative weight and profit. The capacity of this core 
problem is cs, the residual capacity of the break solution. The loss of a core solution 
X is loss(X) = loss(i)a;i. According to Fact 1, solutions with loss larger than the 
integrality gap (or any upper bound, e.g., provided by the best integer solution found so 
far) can not be optimal and, furthermore, cannot be extended to an optimal solution by 
adding more core items. These solutions are filtered out in the enumeration process. 

Horowitz and Sahni decomposition. Horowitz and Sahni [10] present a variation of the 
Nemhauser/Ullmann algorithm utilizing two lists of Pareto points. Assume we partition 
the set of items into I and I' . Let C and C denote the sorted lists of Pareto points over I 
and respectively. Each Pareto point over the complete item set ID I' can be constructed 
by combining two Pareto points from C and C (or unifying the corresponding subsets 
of items). Instead of checking all pairs {p, q) G Cx £', it suffices to find for each p G C 
the most profitable qG C such that the combination of p and q is still feasible. Since the 
lists are sorted, the p G C with the most profitable combination can be found by a single 
parallel scan of the two lists in opposite directions. 

1.1 Experimental Setup 

In order to avoid effects that are caused by the pseudopolynomial complexity of the 
algorithms we used a large integer domain of [0,2^° — 1] for the weights and profits 
of items. The implementation uses 64 bit integer arithmetic (c++ type long long). All 
programs have been compiled under SunOS 5.9 using the GNU compiler g++, version 

3.2.1 with option “-04”. All experiments have been performed on a Sunfire® 15k with 
176GB of main memory and 900MHz SparcIIl+ CPUs (SPECint2000= 535 for one 
processor, according to www.specbench.org). 

Random input distributions. Following the definition in [11], X ^ U [I, r] means that 
the random variable X is chosen independently uniformly at random from the interval 
[l,r]. We investigate the following input distributions: 

Uniform instances: w ^ W[0, 1] and p g- U[Q, 1]. 

(5-correlated instances: w ^ W[0, 1] andp ^ w + r with r g~ lA[—5/2,6/2] for a given 
0 < i5 < 1. Notice that (5-correlated instances are a parameterized version of weakly 
correlated instances used in former studies (e.g. [11,13,17,14]). The latter corre- 
spond to (5-correlated instances with 6 = 1/10. The corresponding density function 
of the profits is upper bounded by (/ = 1/(5. 
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Fig. 2. Classes of test instances: Items are samples uniformly from the shaded area. To obtain 
integers, we scale each number by R and round to the next integer. 



Instances with similar weight: Wi ^ W[(l — e) , 1] and Wi ^ U[Q, 1] . 

Instances with similar profit: ^ ZT[0, 1] and ^ W[( 1 — e) , 1] . 

In order to obtain integer weights and profits that can be processed by our algorithm, we 
scale all values by R and round to the next integer. In this study we used R = — 1 

for all experiments. These definitions are illustrated in Figure 2. Observe that for 5- 
correlated instances some items can have negative profits. This affects a fraction i5/8 
of all items on average. These items have no effect on the Pareto curve. Nevertheless 
they are processed by the Nemhauser/Ullmann algorithm. The reason to include these 
instances in our study is a theoretical analysis of a core algorithm [3] for this class of 
instances. The capacity c of the knapsack is set to j3 ■ X)r=i parameter /3 < 1 

specified for each experiment. If not indicated otherwise, each data point represents the 
average over T trials where T is a parameter specified for each experiment. 



2 Number of Pareto Points 

In the following, q{n) denotes the (average) number of Pareto points over n items 
computed by the Nemhauser/Ullmann algorithm. 

First we tested uniform instances (Figure 3a). We observe that the ratio q(n)/ii? is 
decreasing in n indicating that q{n) = O(n^). As the decrease might be caused by lower 
order terms, the result could be tight. The next experiment for instances with similar 
weight gives some evidence for the contrary. Notice that by choosing e = 1, we obtain 
uniform instances. Figure 3b shows the dependence of e on average number of Pareto 
points for instances with similar weight. When decreasing e then q{n,e) first increases 
about linearly but stays constant when e falls below some value that depends on n. At 
this value, subsets of different cardinality are well separated in the Pareto curve, i.e., 
if Vi denotes the set of Pareto optimal solutions with cardinality i then all knapsack 
fillings in Vi have less weight (and less profit) than any solution in Vi+i- For e = 1/n, 
subsets are obviously separated, but the effect occurs already for larger values of e. A 
similar separation of subsets was used in the proof for the lower bound of n^/4. The ratio 
q{n, e) /q{n, 1) apparently converges to c • log(n) for e — >■ 0 and some c € R>o- In this 
case, the separation would cause a logarithmic increase of q{n, e) compared to uniform 
instances. As our data suggests that E [q{n, e)] = 0{n?) for instances with similar weight 
using e= 1/n (see Figure 3a), the asymptotic behavior of q{n) for uniform instances 
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Fig. 3. Average number of Pareto points, a) Uniform instances and instances with similar weight 
using e = 1/n, T > 2000: the ratio q{n)/in? decreases in n indicating that q{n) = O(n^). b) 
Instances with similar weight, T — 10'^: q{n,e)/q{n, 1) converges to c- log{n) for e ^ 0 c) 5- 
correlated instances, T = 10^:5g(n) decreases in 1/(5 indicating that g(n) = 0(n^/5). d) Uniform 
instances: tail-functions for q{n), graph shows the fraction of experiments with q(n) > x, n=200. 
We run 5 identical experiments to examine the random deviations. 



should be smaller than by at least a logarithmic factor, i.e., there is evidence that 
E [g(n)] = 0(t\} j log n) . The experimental data, however, does not allow any conclusion 
whether or not q{n,e) = for e = 1/n. Notice that the results for instances with 

similar proht and Instances with similar weight are the same. This is due to the fact that, 
for any given knapsack instance, exchanging the weight and profit of each item has no 
influence on the Pareto curve of that instance [1], 

Next we consider ^-correlated instances. As the density parameter for the profit 
distributions is ^ = 1/5, applying Theorem 1 yields a bound of 0(n^/(5). Figure 3a 
shows that 5 ■ q{n,5) decreases in ^ = 1/5. Since uniform instances and (5-correlated 
instances are quite similar for large <5, the data proposes an asymptotic behavior of 
q{n,5) = 0{n^ /5). We conjecture that a separation of subsets with different cardinality 
would increase the average number of Pareto points by a logarithmic factor log(n/5) 
and that 0{n^ /5) = 0{4>n^) is a tight bound for adversarial weights. 

Finally we investigate the distribution of the random variable q{n) for uniform in- 
stances. Theorem 1 bounds the expected number of Pareto points. Applying the Markov 
inequality yields only a weak tail bound: for all a € R>o> Pr > crE [g(n)]] <l/a. 



r n / (5 log^(n/8)) r n / log^(n) 
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Fig. 4. a) Average integrality gap F for uniform instances b) Tail function of F for uniform 
instances with n — 10000 items: graph f{x) gives the fraction of experiments with F > x. c) 
Average integrality gap F for 5-correlated instances, d) Random instances with similar profit: 
average number of core items as a function of e. We used T = 10000, P = 0.4 for all experiments. 



Figure 3d shows a much faster decay of the probability (notice the log scale). For ex- 
ample, the probability that g(200) is larger than three times the average is about 10“® 
instead of 1/3. The experiments suggest that the tail decreases faster than a polynomial 
1 /ct'^ but possibly slower than exponential 1 /c“ . 

In all our experiments (which we could not present here all) we have not found 
a distribution where q{n) was larger than (pfin^ on average, where and fx denote the 
maximum density and the maximum expectation over the profit distributions of all items, 
respectively. This gives us some hope that the upper bound of 0{4>fin'^) can be improved. 

3 Properties of Random Knapsack Instances 

The probabilistic analyses in [9] and [3] are based (among other results) on the upper 
bound for the integrality gap F. Lueker [12] proved that E[T] = 0( ^°^ " ), provided 
the weights and profits are drawn uniformly at random from [0,1]. Lueker also gave a 

lower bound of - which was later improved to " ) by Goldberg and Marchetti- 

Spaccamela [9]. A complete proof of the last result, however, was never published. Our 
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experiments are not sufficient to strongly support this lower bound. For an affirmative 
answer the curve in Figure 4a would need to converge to a constant c > 0 (notice 
the log-scale for the number of items). Our experiments suggest that in this case the 
constant c would be at most 1/12. The small constants involved also account for the 
moderate number of core items when using the core dehnition of Goldberg and Marchetti- 
Spaccamela. The average number of core items in our experiments was 20.2 for uniformly 
random instances with 1000 item and 67.5 for instances with 10® items. 

Goldberg and Marchetti-Spaccamela show that Lueker’s analysis for the upper bound 
of r can easily be generalized to obtain exponential tail bounds on F : there is constant cq 
such that for every a < log^n,Pr [F > aco(logn)^/n] < 2~“. Our experiment show an 
even more rapid decay (Figure 4b), also in terms of absolute values: We performed 10® 
experiments using n = 10000 items. The average integrality gap was A = 1.695 • 10“®. 
The fraction of experiments in which F was larger than 2 A was 4.5-10“^. Only in three 
experiments F was larger than 3 A and the largest F was about 3. 16 A. Adapting Lueker’s 
proof technique, a similar bound for the integrality gap of ^-correlated instances was 
proven in[3]:E[L] = 0(^ log^ ^ ) . Figure 4c comply with the theorem and, as in the 
uniform case, give a weak indication that the bound could be tight, as the curves show 
some tendency to converge to a constant c > 0. 

Finally consider instances with similar profit. Instead of the integrality gap we in- 
vestigated the number of core items. Recall that Goldberg and Marchetti-Spaccamela’s 
definition of the core closely relate the two quantities. In particular, the core includes 
all items whose vertical distance to the Dantzig ray is at most F. Figure 4d shows that 
the core contains more than half of all items when e becomes small. The reason is that 
items close to the Dantzig ray have a very similar weight. If the remaining capacity of 
the knapsack is not close to the weight (or a multiple) of these items, its difficult to 
fill the knapsack close to its capacity. Therefore, items more distant to the Dantzig ray 
have to be used to fill the knapsack, but they exhibit a larger loss. As a consequence, the 
integrality gap is much larger on average and so is the number of core items. 



4 Efficiency of Different Implementations 



Let us first describe our implementation in more detail. First we compute the break 
solution X in linear time. Then we use a dynamic core, adding items in order if increasing 
loss. The capacity of the core problem is c— w{X) and items included in X become 
negative, that is, we negate the weight and profit of those items. The core problem is 
solved using the NemhauserAJllmann algorithm, except for the variant that uses only 
the loss filter, as analyzed in [9]. The loss filter can easily be incorporated into the 
Nemhauser/Ullmann algorithm, as the loss of a Pareto point is determined by its weight 
and proht (see Figure 5f ). Pareto points whose loss is larger than the “current” integrality 
gap are deleted from the list. Observe that all solutions dominated by such a filtered 
Pareto point have an even larger loss. Hence, domination concept and loss hlter are 
“compatible”. When using two lists (Horowitz and Sahni decomposition) we alternately 
extend the lists with the next item and check for a new optimal solution in each such 
iteration. 
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(c) 5-correiated instances: work performed 
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(e) Instances with similar profit: running time 
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(b) Uniform Instances: running time 
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(d) 5-correlated Instances: running time 
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Fig. 5. Efficiency of different implementations: a) work and b) running time for uniform instances 
as a function of n. c) work and d) running time for (5-correlated instances as a function of 1/5 using 
n — 10000. e) Uniform instances with similar profit using n = 10000. Each curve represents an 
implementation that uses a subset of the features as indicated: PO = filtering dominated solutions, 
Loss = loss filter, 2 = two lists, H = additional heuristics, combo = Combo code. Each data point 
represents the average over T = 1000 experiments. We used /3 = 0.4. The given running times for 
uniform and 5-correlated instances do not include the linear time preprocessing (the generation of 
the random instance, finding the optimal fractional solution, partitioning the set of items according 
to the loss of items) as it dominates the running time for easy instances. 

f) Pareto curve of the core problem as a step function. Items included in the break solution are 
negated, as we solve a core problem relative to the Break solution. The slope of line I is the profit- 
to-weight ratio of the break item. Solutions in the shaded area (points whose vertical distance to 
I is larger than F) are filtered out by the loss filter. Observe that if a Pareto point (w,p) is filtered 
out by the loss filter then all solutions dominated by (w,p) are in the shaded area as well. 
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Additional heuristics. Instead of alternately extending the two lists we first extend only 
one list, which grows exponentially at the beginning. Later we use the second list. The 
idea is that the hrst list is based on items with small loss such that the loss hlter has 
hardly an effect. The items used in the second list are expected to have larger loss such 
that the loss hlter keeps the list relatively small and the extension of the list with new 
items is fast. We use lazy evaluation, i.e., we do not scan the two lists in each iteration 
to hnd new optimal solutions as we expect the hrst list to be signihcantly larger the the 
second. Knowing the length of the lists, we balance the work for extending a list and 
scanning both lists. Finally, we delete Pareto points from the list that cannot be extended 
by forthcoming items as their loss is too large. 

We investigate the inhuence of the different concepts (domination concept, loss hlter, 
Horowitz and Sahni decomposition and additional heuristics) on the running time of our 
algorithm, under different random input distributions. We compare our implementation 
to Combo, a knapsack solver due to Pisinger [13]. Combo combines various advanced 
algorithmic techniques that have been independently developed to cope with specihcally 
designed difficult test instances. For instance, it applies cardinality constraints generated 
from extended covers using Lagrangian relaxation, rudimentary divisibility tests and it 
pairs dynamic programming states with items outside the core. Our implementation, 
however, is rather simple but proves to be very efficient in reducing the number of 
enumerated knapsack hlling for a given core. In particular, we add items strictly in order 
of increasing loss to the core. This can lead to a very large core, as shown in Figure 4d. 
We observed that whenever Combo was superior to our implementation, the core size of 
Combo was usually signihcantly smaller. So we consider our implementation not as a 
competing knapsack solver but rather as a study that points out possible improvements. 

Due to technical difficulties with measuring small amounts of processing time and 
huge differences in the running times we employ, besides the running time, another 
measure, called work. Each basic step takes a unit work. This way, an iteration of 
the NemhauserAJllmann algorithm with input list C costs 2\C\ work units. Finding 
an optimal solution by scanning 2 lists C\ and £2 in parallel (Horowitz and Sahni 
decomposition) costs \C\ \ + |£ 2 | work units. 

Figure 5 shows the work and running time for different implementations. We expect 
the work as well as the running time to increase quasi-linear in n and \/ 5 for uniform 
and (5-correlated instances (see [3] for a theoretical result). For an easier comparison, we 
divided work and running time by n and 1 /5, accordingly. Notice that almost all scales 
are logarithmic. 

In contrast to the theoretical analyses of core algorithms in [9] and [3], where the 
domination concept leads to a better running time bound than the loss hlter, the latter is 
superior in all experiments. This remains true for large instances, even if the domination 
concept is complemented by the Horowitz/Sahni decomposition (2 lists of Pareto points). 
The Combo code is superior to all three variants. Interestingly, the combination of the 3 
features (PO+Loss+2) clearly outperform Combo for uniform and (5-correlated instances. 
In fact, it exhibits a sublinear running time for uniform instances (without linear time 
preprocessing). Using the described heuristics signihcantly improves the running times 
further. The average running time of Combo and our fastest implementation differs by a 
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factor of 190 for uniform instances with 1024000 items. For (5-correlated instances with 
10000 items and S = 1/1024, the factor is about 700. 

The limitations of our implementation is illustrated in Figure 5e, which shows the 
running time (with preprocessing) for instances with similar profit. Besides the large 
core (Figure 4d), the large integrality gap makes the loss filter less efficient. 
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Abstract. Edge contraction is shown to be a useful mechanism to im- 
prove lower bound heuristics for treewidth. A successful lower bound 
for treewidth is the degeneracy: the maximum over all subgraphs of the 
minimum degree. The degeneracy is polynomial time computable. We 
introduce the notion of contraction degeneracy: the maximum over all 
graphs that can be obtained by contracting edges of the minimum de- 
gree. We show that the problem to compute the contraction degeneracy 
is AP-hard, but for fixed k, it is polynomial time decidable if a given 
graph G has contraction degeneracy at least k. Heuristics for computing 
the contraction degeneracy are proposed and experimentally evaluated. 
It is shown that these can lead to considerable improvements to the 
lower bound for treewidth. A similar study is made for the combination 
of contraction with Lucena’s lower bound based on Maximum Cardinal- 
ity Search [12]. 



1 Introduction 

It is about two decades ago that the notion of treewidth and the equivalent 
notion of partial fc-tree were introduced. Nowadays, these play an important 
role in many theoretic studies in graph theory and algorithms, but also their use 
for practical applications is growing, see, e.g.: [10,11]. A first step when solving 
problems on graphs of bounded treewidth is to compute a tree decomposition 
of (close to) optimal width, on which often a dynamic programming approach is 
applied. Such a dynamic programming algorithm typically has a running time 
that is exponential in the treewidth of the graph. Since determining the treewidth 
of a graph is an AP-hard problem, it is rather unlikely to find efficient algorithms 
for computing the treewidth. Therefore, we are interested in lower and upper 
bounds for the treewidth of a graph. 

This paper focuses on lower bounds for the treewidth. Good lower bounds 
can serve to speed up branch and bound methods, inform us about the quality 

* This work was partially supported by the DFG research group “Algorithms, Struc- 
ture, Randomness” (Grant number GR 883/9-3, GR 883/9-4), and partially by the 
Netherlands Organisation for Scientific Research NWO (project Treewidth and Com- 
binatorial Optimisation). 
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of upper bound heuristics, and in some cases, tell us that we should not use tree 
decompositions to solve a problem on a certain instance. A large lower bound 
on the treewidth of a graph implies that we should not hope for efficient dy- 
namic programming algorithms that use tree decompositions for this particular 
instance. 

In this paper, we investigate the combination of contracting edges with two 
existing lower bounds for treewidth, namely the degeneracy (or MMD [9]) and 
the MCSLB (the Maximum Cardinality Search Lower Bound), introduced by 
Lucena [12]. The definitions of these lower bounds can be found in Section 2. 
Contracting an edge is replacing its two endpoints by a single vertex, which 
is adjacent to all vertices at least one of the two endpoints was adjacent to. 
Combining the notion of contraction with degeneracy gives the new notion of 
contraction degeneracy of a graph G: the maximum over all graphs that can be 
obtained by contracting edges of G of the minimum degree. It provides us with a 
new lower bound for the treewidth of graphs. While unfortunately, computing the 
contraction degeneracy of a graph is A^P-complete (as is shown in Section 3.2), 
the fixed parameter cases are polynomial time solvable (see Section 3.3), and 
there are simple heuristics that provide us with good bounds on several instances 
taken from real life applications (see Section 5). 

We experimented with three different heuristics for the contraction degen- 
eracy, based on repeatedly contracting a vertex of minimum degree with (a) a 
neighbor of smallest degree, (b) a neighbor of largest degree, or (c) a neighbor 
that has as few as possible common neighbors. We see that the last one outper- 
forms the other two, which can be explained by noting that such a contraction 
‘destroys’ only few edges in the graph. Independently, the heuristic with con- 
traction with neighbors of minimum degree has been proposed as a heuristic for 
lower bounds for treewidth by Gogate and Dechter [7] in their work on branch 
and bound algorithms for treewidth. 

The lower bound provided by MCSLB is never smaller than the degeneracy, 
but can be larger [3] . The optimisation problem to find the largest possible bound 
of MCSLB for a graph obtained from G by contracting edges is also iVP-hard, 
and polynomial time solvable for the fixed parameter case (Section 4). We also 
studied some heuristics for this bound. In our experiments, we have seen that 
typically, the bound by MCSLB is equal to the degeneracy or slightly larger. In 
both cases, often a large increase in the lower bound is obtained when we combine 
the method with contraction. See Section 5 for results of our computational 
experiments. They show that contraction is a very viable idea for obtaining 
treewidth lower bounds. 

A different lower bound for treewidth was provided by Ramachandramurthi 
[13,14]. While this lower bound appears to generally give small lower bound 
values, it can also be combined with contraction. Some first tests indicate that 
this gives in some cases a small improvement to the lower bound obtained with 
contraction degeneracy. 

An interesting and useful method for getting treewidth lower bounds was 
given by Clautiaux et al. [4]. The procedure from [4] uses another treewidth 
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lower bound as subroutine. The description in [4] uses the degeneracy, but one 
can also use another lower bound instead. Our experiments (not given in this 
extended abstract for space reasons) showed that the contraction degeneracy 
heuristics generally outperform the method of [4] with degeneracy. Very recently, 
we combined the method of [4] with the contraction degeneracy heuristics, using 
a ‘graph improvement step as in [4]’ after each contraction of an edge. While 
this increases the time of the algorithm significantly, the algorithm still takes 
polynomial time, and in many cases considerable improvements to the lower 
bound are obtained. More details of this new heuristic will be given in the full 
version of this paper. 



2 Preliminaries 

Throughout the paper G = (V, E) denotes a simple undirected graph. Most of 
our terminology is standard graph theory/ algorithm terminology. As usual, the 
degree in G of vertex v is (Ig{v) or simply d{v). N{S) for S' C V denotes the open 
neighbourhood of S, i.e. N{S) = (JsgS N{s)\S. We define: S{G) := min„gy d{v). 

Subgraphs and Minors. After deleting vertices of a graph and their incident 
edges, we get an induced subgraph. A subgraph is obtained, if we additionally allow 
deletion of edges. If we furthermore allow edge-contractions, we get a minor. The 
treewidth of a minor of G is at most the treewidth of G (see e.g. [2]). 

Edge- Contraction. Contracting edge e = {m, u} in the graph G = (V,E), 
denoted as G/e, is the operation that introduces a new vertex Og and new edges 
such that Og is adjacent to all the neighbours of v and u and delete vertices u and 
V and all edges incident to u or v. Gje := {V , E'), where V' = {og} U V \ {m, r;} 
and E' = { {ag,a:} | x € V({m, ?;})}Uif\{e' € E \ e'fle yf 0}. A contraction- set is 
a cycle free set E' C E{G) of edges. Note that after each single edge-contraction 
the names of the vertices are updated in the graph. Hence, for two adjacent 
edges e = {u,v} and / = {r’,^}, edge / will be different after contracting edge 
e, namely in G/e we have / = {ag,w}. Thus, / represents the same edge in 
G and in G/e. For a contraction-set E' = {ei, e 2 , . . . , Cp}, we define G/E' := 
G/ei/e 2 / . . . /ep. Furthermore, note that the order of edge-contractions to obtain 
G/E' is not relevant. A contraction Ff of G is a graph such that there exists a 
contraction-set E' with: El = G/E' . 

Treewidth. A tree decomposition of G = {V,E) is a pair {{Xi \ i € I},T = 
(I,E)), with {Xi \ i G 1} a family of subsets of V and T a tree, such that 

S E, there is an f £ / with v,w G Xi, and for all 
£ I- if G is on the path from zg to *2 in T, then Xi^ fl C The 
width of tree decomposition {{Xi \ i G I},T = (I,E)) is max^g/ \Xi\ — 1. The 
treewidth tw{G) of G is the minimum width among all tree decompositions of G. 
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Degeneracy/MMD. We also use the term MMD (Maximum Minimum De- 
gree) for the degeneracy. The degeneracy 6D of a graph G is defined to be: 
6D{G) := maxG'{(5(G') | G' is a subgraph of G}. The minimum degree of a 
graph is a lower bound for its treewidth and the treewidth of G cannot increase 
by taking subgraphs. The treewidth of G is at least its degeneracy. (See also [9].) 

Maximum Cardinality Search. MCS is a method to number the vertices 
of a graph. It was first introduced by Tar j an and Yannakakis for the recogni- 
tion of chordal graphs [18]. We start by giving some vertex number 1. In step 
i = 2, . . . , n, we choose an unnumbered vertex v that has the largest number 
of already numbered neighbours, breaking ties as we wish. Then we associate 
number i to vertex v. 

An MS C- ordering tp can be defined by mapping each vertex to its number: 
ipiv) ■= number of v. For a fixed MCS-ordering, let Vi := The visited 

degree vd^p^Vi) of Vi is defined as follows: vd^p^v) := dc[vi,...,vi]{vi). The visited 
degree MCSLB^ of an MCS-ordering ip is defined as follows: MCSLB^ := 
maXigy(G.) vd^{vP) 

In [12], Lucena has shown that for every graph G and MCS-ordering ip of 
G, MCSLBjp < tw{G). Thus, an MCS numbering gives a lower bound for the 
treewidth of a graph. 

3 Contraction Degeneracy 

We define the new parameter of contraction degeneracy and the related compu- 
tational problem. Then we will show the A^P-completeness of the problem just 
defined, and consider the complexity of the fixed parameter cases. 

3.1 Definition of the Problem 

Definition 1. The contraction degeneracy SG of a graph G is defined as follows: 
SG{G) := n:mx{(5(G') ] G' is a minor of G} 

It is also possible to define 6G as the maximum over all contractions of the 
minimum degree of the contraction. Both definitions are equivalent, since delet- 
ing edges or vertices does not help to obtain larger degrees. The corresponding 
decision problem is formulated as usual: 

Problem: Contraction Degeneracy 
Instance: Graph G = (V,E) and integer k > 0. 

Question: Is the contraction degeneracy of G at least k, i.e. is 6G{G) > 
k, i.e. is there a contraction-set E' C E, such that 6{G/E') > kl 

Lemma 1. For a graph G, it holds: SG{G) < tw{G) 

Proof. Note that for any minor G' of G, we have: tw(G') < tw{G). Furthermore, 
for any graph G': S(G') < tw(G'). The lemma follows now directly. □ 
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Fig. 1. Intermediate graph G' constructed for the transformation. 



3.2 7VP- Completeness 

Theorem 1. The Contraction Degeneracy problem is N P-eomplete for 
bipartite graphs. 

Proof. Because of space constraints, we will not give the full proof here. Instead, 
we only show the construction for the transformation. Clearly, the problem is in 
NP. The hardness proof is a transformation from the Vertex Cover problem. 
Let be given a Vertex Cover instance (G, fc), with G = (V,E). 

Construction: Starting from the complement of G, we add vertices and edges to 
construct a new graph G', which is an intermediate step in the transformation. 
G' = {V ,E') is formally defined as follows, see Figure 1: 



V' = VG {wi, W2} U {ui, . . . , Mfc} 

F;' = (V X V \ F;) U { {wi, W2} } U { { Wi , ?;} I * e {1, 2} A ?; e V } U 
{ { Ui , v } I i e {!,... ,k} Av ev } 

The final graph G* = (V*,E*) in our construction is obtained by subdividing 
any edge in G', i.e. replacing each edge in G' by a path with two edges. 

V* = V' G { Vf, \ e & E' } and E* = { {u, Ve}, {ve, u>} | e = {u, w} G E' } 

Let n := \V\, the constructed instance of the Contraction Degeneracy 
problem is {G*,n + 1). 

Now, it is easy to see that G* is a bipartite graph. It is also possible to prove 
that there is a vertex cover for G of size at most k if, and only if<5G(G*) > n+1. 

□ 



3.3 Fixed Parameter Cases 

Now, we consider the fixed parameter case of the Contraction Degeneracy 
problem. I.e., for a fixed integer k, we consider the problem to decide for a given 
graph G if G has a contraction with minimum degree k. Graph minor theory 
gives a fast answer to this problem. Those unfamiliar with this theory are referred 
to [6]. 
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Theorem 2. The Contraction Degeneracy problem can be solved in linear 
time when k is a fixed integer with k < 5, and can be solved in 0{n^) time when 
k is a fixed integer with k > 6. 

Proof. Let A: be a fixed integer. Consider the class of graphs Gk = {G \ G has 
contraction degeneracy at most k — 1}. Gk is closed under taking minors: if H 
is a minor of G and H has contraction degeneracy at least fc, then G has also 
contraction degeneracy at least k. As every class of graphs that is closed under 
minors has an O(n^) algorithm to test membership by Robertson-Seymour graph 
minor theory (see [6]), the theorem for the case that k > 6 follows. 

Suppose now that k < 5. There exists a planar graph Gk with minimum 
degree k. Hence, Gk ^ Gk- class of graphs that is closed under taking minors 
and does not contain all planar graphs has a linear time membership test (see 
[6]), which shows the result for the case that k < 5. □ 

It can be noted that the cases that k = 1 and k = 2 are very simple: a graph 

has contraction degeneracy at least 1, if and only if it has at least one edge, 
and it has contraction degeneracy at least 2, if and only if it is not a forest. 

The result is non-constructive when k > 6; when fc < 5, the result can be made 

constructive by observing that the property that G has contraction degeneracy k 
can be formulated in monadic second order logic for fixed k. Thus, we can solve 
the problem as follows: the result of [17] applied to Gk gives an explicit upper 
bound Ck on the treewidth of graphs in Gk- Test if G has treewidth at most Ck, 
and if so, find a tree decomposition with width at most Ck with the algorithm 
of [1]. If G has treewidth at most Ck, use the tree decomposition to test if the 
MSOL formula holds for G [5]; if not, we directly know that G has contraction 
degeneracy at least k. The constant factors hidden in the O-notation are very 
large, thus these results have only theoretical importance. We summarise the 
different cases in the following table. 



Table 1. Complexity of Contraction Degeneracy 



k 


Time 


Reference 


1 

2 

3, 4, 5 
fixed fc > 6 
variable fc 


0(n) 

0(n) 

0{n) 

O(n^) 

N P-complete 


Trivial 

G is not a forest 
[1,5, 17], MSOL 
[16,15] 
Theorem 1 



4 Maximum Cardinality Search with Contraction 

From a maximum cardinality search ordering, we can derive a parameter that 
is a lower bound for the treewidth of a graph (see Section 2). We define four 
problems related to this parameter. Each of these is AfP-complete or iVP-hard, 
respectively. We consider the following problem, and variants. 
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Problem: MCSLB With Contraction 
Instance: Graph G = (V,E), integer k. 

Question: Does G have a contraction H, and H an MCS-ordering ijj 
with the visited degree of ip at least kl 

In the variant MCSLB With Minors we require instead that H is a, minor of 
G. In the variants MinMCSLB With Contraction and MinMCSLB With 
Minors we ask that every MCS-ordering ip of H has visited degree at least k. 

Theorem 3. MCSLB With Contraction and MCSLB With Minors are 
NP-eomplete. MinMCSLB With Contraction and MinMCSLB With Mi- 
nors are NP-hard. 

The hardness proofs are transformations from Vertex Cover, with a some- 
what more involved variant of the proof of Theorem 1. Membership in NP is 
trivial for MCSLB With Contraction and MCSLB With Minors, but 
unknown for the other two problems. 

The fixed parameter case of MCSLB With Minors can be solved in linear 
time with help of graph minor theory. Observing that the set of graphs {G | G 
does not have a minor H, such that H has an MCS-ordering ip with the visited 
degree of ip at least k} is closed under taking of minors, and does not include all 
planar graphs (see [3]), gives us by the Graph Minor theorem of Robertson and 
Seymour and the results in [1,17] the following result. 

Theorem 4. MCSLB With Minors and MinMCSLB With Minors are 
linear time solvable for fixed k. 

The fixed parameter cases of MinMCSLB With Minors give an O(n^) 
algorithm (linear when fc < 5), similar as for Theorem 2. Again, note that these 
results are only of theoretical interest, due to the high constant factors hidden 
in the O-notation, and the non-constructiveness of the proof. 



5 Experimental Results 

In this section, we report on the results of computational experiments we have 
carried out. We tested our algorithms on a number of graphs, obtained from 
two application areas where treewidth plays a role in algorithms that solve the 
problems at hand. The first set of instances are probabilistic networks from ex- 
isting decision support systems from fields like medicine and agriculture. The 
second set of instances are from frequency assignment problems from the EU- 
CLID CALMA project (see e.g. [8]). We have also used this set of instances in 
earlier experiments. We excluded those networks for which the MMD heuristic 
already gives the exact treewidth. Some of the graphs can be obtained from [19]. 
All algorithms have been written in C-|— k, and the computations have been 
carried out on a Linux operated PC with a 3.0 GHz Intel Pentium 4 processor. 
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Table 2. Graph sizes, upper bounds, and MMD/MMD+ lower bounds 



instance 


size 


UB 


MMD 






MMD-h 








IV^I 


1^1 




LB CPU 


min-d. 
LB CPU 


max-d. 
LB CPU 


least-c. 
LB CPU 


barley 


dS 


126 


7 


5 


0.00 


6 


0.00 


5 


0.00 


6 


0.00 


diabetes 


dl3 


S19 


d 


3 


0.00 


d 


0.01 


d 


0.00 


d 


0.00 


link 


72d 


1738 


13 


d 


0.00 


8 


0.02 


5 


0.01 


11 


0.03 


mildew 


35 


80 


d 


3 


0.00 


d 


0.00 


3 


0.00 


d 


0.00 


muninl 


ISO 


366 


11 


d 


0.00 


8 


0.01 


5 


0.00 


10 


0.00 


munin2 


1003 


1662 


7 


3 


0.01 


6 


0.01 


d 


0.01 


6 


0.02 


muninS 


lOdd 


17d5 


7 


3 


0.01 


7 


0.01 


d 


0.02 


7 


0.02 


munind 


lOdl 


18d3 


8 


d 


0.01 


7 


0.01 


5 


0.01 


7 


0.02 


oesoca+ 


67 


208 


11 


9 


0.00 


9 


0.00 


9 


0.00 


9 


0.00 


oow-trad 


33 


72 


6 


3 


0.00 


d 


0.00 


d 


0.00 


5 


0.00 


oow-bas 


27 


5d 


d 


3 


0.00 


d 


0.00 


3 


0.00 


d 


0.00 


oow-solo 


dO 


87 


6 


3 


0.00 


d 


0.00 


d 


0.00 


5 


0.00 


pathfinder 


109 


211 


6 


5 


0.00 


6 


0.00 


5 


0.00 


6 


0.01 


pignet2 


3032 


726d 


135 


d 


0.01 


29 


0.11 


10 


0.07 


38 


0.20 


pigs 


ddl 


806 


10 


3 


0.00 


6 


0.01 


d 


0.00 


7 


0.01 


ship-ship 


50 


lid 


8 


d 


0.00 


6 


0.00 


d 


0.00 


6 


0.00 


water 


32 


123 


10 


6 


0.00 


7 


0.00 


7 


0.00 


8 


0.00 


wilson 


21 


27 


3 


2 


0.00 


3 


0.00 


3 


0.00 


3 


0.00 


celarOl 


d5S 


ldd9 


17 


8 


0.00 


12 


0.01 


9 


0.00 


Id 


0.02 


celar02 


100 


311 


10 


9 


0.00 


9 


0.00 


9 


0.00 


10 


0.00 


celarOS 


200 


721 


15 


8 


0.00 


11 


0.00 


9 


0.00 


13 


0.01 


celarOd 


3d0 


1009 


16 


9 


0.00 


12 


0.01 


9 


0.00 


13 


0.01 


celar05 


200 


681 


15 


9 


0.01 


11 


0.00 


9 


0.00 


13 


0.01 


celar06 


100 


350 


11 


10 


0.00 


11 


0.00 


10 


0.00 


11 


0.01 


celar07 


200 


817 


18 


11 


0.00 


13 


0.00 


12 


0.01 


15 


0.00 


celarOS 


d5S 


1655 


18 


11 


0.00 


13 


0.01 


12 


0.01 


15 


0.01 


celarOO 


3d0 


1130 


18 


11 


0.01 


13 


0.01 


12 


0.00 


15 


0.01 


celarll 


3d0 


975 


15 


8 


0.00 


11 


0.00 


9 


0.00 


13 


0.01 


graphOl 


100 


358 


25 


8 


0.00 


9 


0.01 


9 


0.00 


Id 


0.00 


graph02 


200 


709 


51 


6 


0.00 


11 


0.00 


7 


0.01 


20 


0.02 


graphOS 


100 


3d0 


22 


5 


0.00 


8 


0.00 


6 


0.01 


Id 


0.00 


graphOd 


200 


73d 


55 


6 


0.00 


12 


0.01 


7 


0.00 


19 


0.02 


graph05 


100 


dl6 


26 


8 


0.00 


9 


0.00 


9 


0.00 


15 


0.00 


graph06 


200 


8d3 


53 


8 


0.00 


12 


0.01 


9 


0.00 


21 


0.02 


graph07 


200 


8d3 


53 


8 


0.00 


12 


0.01 


9 


0.00 


21 


0.02 


graphOS 


3d0 


123d 


91 


7 


0.00 


16 


0.02 


8 


0.01 


26 


O.Od 


graphOO 


d5S 


1667 


118 


8 


0.01 


17 


0.03 


9 


0.01 


29 


0.06 


graphlO 


3d0 


1275 


96 


6 


0.00 


15 


0.01 


7 


0.01 


27 


O.Od 


graph 11 


3d0 


ld25 


98 


7 


0.00 


17 


0.01 


8 


0.01 


27 


0.05 


graphl2 


3d0 


1256 


90 


5 


0.00 


16 


0.01 


6 


0.01 


25 


O.Od 


graphlS 


d5S 


1877 


126 


6 


0.00 


18 


0.02 


7 


0.01 


31 


0.07 


graph Id 


d5S 


1398 


121 


d 


0.00 


20 


0.02 


8 


0.02 


27 


0.05 
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Table 3. MCSLB/MCSLB+ lower bounds 



instance 


MCSLB 






MCSLB-f LBs 










LB CPU 


min- 


■deg. 


last- 


•mcs 


max 


-mcs 


average 

CPU 


min-d. 


least-c. 


min-d. 


least-c. min-d. 


least-c. 


barley 


S 


0.01 


6 


6 


6 


6 


6 


5 


0.06 


diabetes 


4 


0.92 


4 


4 


4 


4 


4 


4 


5.23 


link 


S 


3.09 


8 


10 


8 


11 


8 


6 


43.08 


mildew 


3 


0.00 


4 


4 


4 


4 


4 


4 


0.03 


muninl 


4 


0.17 


8 


10 


9 


10 


9 


7 


0.95 


munin2 


4 


5.46 


6 


6 


6 


6 


5 


6 


31.30 


muninS 


4 


5.87 


6 


7 


7 


7 


6 


7 


33.80 


munin4 


S 


6.06 


7 


7 


7 


8 


6 


7 


48.30 


oesoca+ 


9 


0.02 


9 


9 


9 


9 


9 


9 


0.15 


oow-trad 


4 


0.00 


5 


5 


5 


5 


5 


4 


0.03 


oow-bas 


3 


0.00 


4 


4 


4 


4 


4 


4 


0.02 


oow-solo 


4 


0.01 


4 


5 


4 


5 


5 


5 


0.05 


pathfinder 


6 


0.05 


6 


6 


6 


6 


6 


6 


0.34 


pignet2 


S 59.60 


28 


39 


30 


39 


16 


18 


509.60 


pigs 


3 


1.01 


7 


7 


7 


7 


6 


6 


5.12 


ship-ship 


5 


0.01 


6 


6 


6 


6 


6 


6 


0.06 


water 


8 


0.00 


8 


8 


8 


8 


8 


8 


0.04 


wilson 


3 


0.00 


3 


3 


3 


3 


3 


3 


0.01 


celarOl 


10 


1.20 


12 


13 


12 


14 


12 


13 


17.08 


celar02 


9 


0.06 


9 


10 


9 


10 


9 


9 


0.30 


celarOS 


9 


0.23 


11 


12 


11 


13 


12 


12 


1.54 


celar04 


11 


0.66 


11 


13 


12 


13 


12 


13 


4.85 


celarOS 


9 


0.23 


12 


13 


12 


13 


12 


12 


1.80 


celar06 


11 


0.06 


11 


11 


11 


11 


11 


11 


0.33 


celar07 


12 


0.24 


13 


15 


13 


15 


13 


15 


1.66 


celarOS 


12 


1.45 


13 


15 


14 


15 


13 


15 


17.04 


celarOO 


12 


0.82 


13 


15 


14 


15 


14 


15 


4.62 


celarll 


10 


0.66 


11 


13 


12 


13 


12 


12 


4.43 


graphOl 


9 


0.06 


9 


15 


11 


14 


12 


13 


0.45 


graph02 


8 


0.24 


12 


19 


12 


19 


14 


13 


2.54 


graphOS 


6 


0.06 


9 


13 


10 


14 


9 


13 


0.49 


graph04 


8 


0.25 


12 


20 


13 


20 


14 


16 


1.95 


graphOS 


9 


0.06 


10 


15 


11 


16 


11 


14 


0.47 


graph06 


9 


0.26 


12 


22 


14 


22 


14 


15 


2.22 


graph07 


9 


0.26 


12 


22 


14 


22 


14 


15 


2.15 


graphOS 


9 


0.76 


17 


26 


18 


26 


18 


20 


5.89 


graphOO 


9 


1.43 


17 


30 


20 


28 


22 


23 


11.51 


graphlO 


8 


0.77 


18 


26 


17 


26 


19 


22 


6.25 


graph 11 


8 


0.80 


15 


27 


18 


27 


17 


26 


6.53 


graph 12 


7 


0.76 


15 


25 


16 


25 


17 


21 


5.92 


graph 13 


8 


1.48 


18 


32 


19 


31 


20 


28 


12.89 


graphl4 


5 


1.35 


19 


28 


20 


28 


21 


25 


10.11 
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5.1 The Heuristics 

MMD computes exactly the degeneracy, by deleting a vertex of minimum de- 
gree and its incident edges in each iteration. 

MMD+ is a derivative of MMD. Instead of deleting a vertex v of minimum 
degree, we contract it to a neighbour u. We consider three strategies how to 
select a neighbour: 

— min-d. selects a neighbour with minimum degree, motivated by the idea that 
the smallest degree is increased as fast as possible in this way. 

— max-d. selects a neighbour with maximum degree, motivated by the idea 
that we end up with some vertices of very high degree. 

— least-c. selects a neighbour u of v, such that u and v have the least number 
of common neighbours, motivated by the idea to delete as few as possible 
edges in each iteration to get a high minimum degree. 

MCSLB computes \V\ MC'5-orderings tjj - one for each vertex as start vertex. 
It returns the maximum over these orderings of MCSLB^, cf. [3]. 

MCSLB+ initially uses MCSLB to find a start vertex w with largest 
MCSLB+. After each computation of an MC'S'-ordering, we select a vertex 
which we will contract, and then we apply MCSLB-1- again. To reduce the CPU 
time consumption, MCSLB+ is computed only for start vertex w instead of all 
possible start vertices. Three strategies for selecting a vertex v to be contracted 
are examined: 

— min-deg. selects a vertex of minimum degree. 

— last-mcs selects the last vertex in the just computed MC'S'-ordering. 

— max-mcs selects a vertex with maximum visited degree in the just computed 
MCS-ordering. 

Once a vertex v is selected, we select a neighbour u of u using the two strategies 
min-d. and least-c. that are already explained for MMD-I-. 

5.2 Tables 

Table 2 and 3 show the computational results for the described graphs. Table 2 
shows the size of the graphs and the best known upper bound UB for treewidth, 
cf. [9,19]. Furthermore, Table 2 presents the lower bounds LB and the CPU times 
in seconds for the MMD and MMD-I- heuristic with different selection strategies. 
Table 3, shows the same information for the MCSLB and MCSLB-I- heuristics 
with different combinations of selection strategies. Because of space constraints 
and similarity, we only give the average running times. 

In both tables, the graphs for which the lower and upper bound coincide are 
highlighted bold. In total, the treewidth of 8 graphs could be determined by 
MMD-I-, and one more by MCSLB-I-. 
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The MMD+ results show that the min-d. and in particular the least-c. strat- 
egy are very successful. The max-d. strategy is only marginal better than the 
degeneracy. For the MCSLB-I- we observe that min-deg. and last-mcs in combina- 
tion with least-c. outperform the other strategies. The added value of MCSLB-I- 
to MMD-I- is marginal, also because of the relatively large computation times. 

The success of the least-c. strategy is explained as follows. When we contract 
an edge {u, w}, for each common neighbor x ofv and w, the two edges {v, x} and 
{w, x} become the same edge in the new graph. Thus, with the least-c. strategy, 
we have more edges in the graph after contraction, which often leads to a larger 
minimum degree. 

6 Discussion and Conclnding Remarks 

In this article, we examined two new methods MMD-I- and MCSLB-I- for tree- 
width lower bounds which are derivatives of the MMD and MCSLB heuristics. 
We showed the two corresponding decision problems to be A^P-complete. 

The practical experiments show that contracting edges is a very good ap- 
proach for obtaining lower bounds for treewidth as it considerably improves 
known lower bounds. When contracting a vertex, it appears that the best is to 
contract to a neighbor that has as few as possible common neighbors. We can 
see that in some cases the MCSLB-I- gives a slightly better lower bound than the 
MMD-I-. However, the running times of the MMD-I- heuristic are almost always 
negligible, while the running times of the MCSLB-I- heuristic can reach a consid- 
erable order of magnitude. Furthermore, we see that the strategy for selecting a 
neighbour u of u with the least number of common neighbours of u and v often 
performs best. 

Finally, notice that although the gap between lower and upper bound could 
be significantly closed by contracting edges within the algorithms, the absolute 
gap is still large for many graphs (pignet2, graph*), which asks for research on 
new and better heuristics. 

MMD-I- is a simple method usually performing very well. However, it has 
its limits, e.g., on planar graphs. A planar graph always has a vertex of degree 
< 5, and therefore MMD and MMD-I- can never give lower bounds larger than 5. 
Similar behaviour could be observed in experiments on ‘almost’ planar graphs. 

Apart from its function as a treewidth lower bound, the contraction degen- 
eracy appears to be an attractive and elementary graph measure, worth further 
study. For instance, interesting topics are its computational complexity on special 
graph classes or the complexity of approximation algorithms with a guaranteed 
performance ratio. While we know the fixed parameter cases are polynomial time 
solvable, it is desirable to have algorithms that solve e.g., contraction degeneracy 
for small values of k efficiently in practice. 



Contraction and Treewidth Lower Bounds 



639 



References 

1. H. L. Bodlaender. A linear time algorithm for finding tree-decompositions of small 
treewidth. SIAM J. Comput., 25:1305-1317, 1996. 

2. H. L. Bodlaender. A partial fc-arboretum of graphs with bounded treewidth. Theor. 
Comp. Sc., 209:1-45, 1998. 

3. H. L. Bodlaender and A. M. C. A. Koster. On the maximum cardinality search 
lower bound for treewidth, 2004. Extended abstract to appear in proceedings WG 
2004. 

4. F. Clautiaux, J. Carlier, A. Moukrim, and S. Negre. New lower and upper bounds 
for graph treewidth. In J. D. P. Rolim, editor. Proceedings International Work- 
shop on Experimental and Efficient Algorithms, WEA 2003, pages 70-80. Springer 
Verlag, Lecture Notes in Computer Science, vol. 2647, 2003. 

5. B. Courcelle. The monadic second-order logic of graphs I: Recognizable sets of 
finite graphs. Information and Computation, 85:12-75, 1990. 

6. R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, 1998. 

7. V. Gogate and R. Dechter. A complete anytime algorithm for treewidth. To appear 
in proceedings UAT04, Uncertainty in Artificial Intelligence, 2004. 

8. A. M. C. A. Koster. Frequency assignment - Models and Algorithms. PhD thesis, 
Univ. Maastricht, Maastricht, the Netherlands, 1999. 

9. A. M. C. A. Koster, H. L. Bodlaender, and S. P. M. van Hoesel. Treewidth: Com- 
putational experiments. In H. Broersma, U. Faigle, J. Hurink, and S. Pickl, editors. 
Electronic Notes in Discrete Mathematics, volume 8. Elsevier Science Publishers, 
2001 . 

10. A. M. C. A. Koster, S. P. M. van Hoesel, and A. W. J. Kolen. Solving partial 
constraint satisfaction problems with tree decomposition. Networks, 40:170-180, 
2002 . 

11. S. J. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on 
graphical structures and their application to expert systems. The Journal of the 
Royal Statistical Society. Series B (Methodological), 50:157-224, 1988. 

12. B. Lucena. A new lower bound for tree- width using maximum cardinality search. 
SIAM J. Disc. Math., 16:345-353, 2003. 

13. S. Ramachandramurthi. A lower bound for treewidth and its consequences. In 
E. W. Mayr, G. Schmidt, and G. Tinhofer, editors, Proceedings 20th International 
Workshop on Graph Theoretic Concepts in Computer Scienee WG’94, pages 14-25. 
Springer Verlag, Lecture Notes in Gomputer Science, vol. 903, 1995. 

14. S. Ramachandramurthi. The structure and number of obstructions to treewidth. 
SIAM J. Disc. Math., 10:146-157, 1997. 

15. N. Robertson and P. D. Seymour. Graph minors. XX. Wagner’s conjecture. To 
appear. 

16. N. Robertson and P. D. Seymour. Graph minors. XIII. The disjoint paths problem. 
J. Comb. Theory Series B, 63:65-110, 1995. 

17. N. Robertson, P. D. Seymour, and R. Thomas. Quickly excluding a planar graph. 
J. Comb. Theory Series B, 62:323-348, 1994. 

18. R. E. Tarjan and M. Yannakakis. Simple linear time algorithms to test chordiality 
of graphs, test acyclicity of graphs, and selectively reduce acyclic hypergraphs. 
SIAM J. Comput., 13:566-579, 1984. 

19. Treewidthlib. http://www.cs.uu.nl/people/hansb/treewidthlib, 2004-03-31. 



Load Balancing of Indivisible Unit Size Tokens 
in Dynamic and Heterogeneous Networks* 



Robert Elsasser, Burkhard Monien, and Stefan Schamberger 



Institute for Computer Science 
University of Paderborn 
Fiirstenallee 11, D-33102 Paderborn 
{elsa,bm, schaumjOuni-paderborn. de 



Abstract. The task of balancing dynamically generated work load oc- 
curs in a wide range of parallel and distributed applications. Diffusion 
based schemes, which belong to the class of nearest neighbor load bal- 
ancing algorithms, are a popular way to address this problem. Originally 
created to equalize the amount of arbitrarily divisible load among the 
nodes of a static and homogeneous network, they have been generalized 
to heterogeneous topologies. Additionally, some simple diffusion algo- 
rithms have been adapted to work in dynamic networks as well. However, 
if the load is not divisible arbitrarily but consists of indivisible unit size 
tokens, diffusion schemes are not able to balance the load properly. In 
this paper we consider the problem of balancing indivisible unit size to- 
kens on dynamic and heterogeneous systems. By modifying a randomized 
strategy invented for homogeneous systems, we can achieve an asymptot- 
ically minimal expected overload in h, I 2 and loo norm while only slightly 
increasing the run-time by a logarithmic factor. Our experiments show 
that this additional factor is usually not required in applications. 



1 Introduction 

Load balancing is a very important prerequisite for the efficient use of parallel 
computers. Many distributed applications produce work load dynamically which 
often leads to dramatical differences in runtime. Thus, in order to achieve an 
efficient use of the processor network, the amount of work has to be balanced 
during the computation. Obviously, we can ensure an overall benefit only if the 
balancing scheme itself is highly efficient. 

If load is arbitrarily divisible, the balancing problem for a homogeneous net- 
work with n nodes can be described as follows. At the beginning, each node 
i contains some work load Wi. The goal is to obtain the balanced work load 
W = X)r=i nodes. We assume for now that no load is generated or 

consumed during the balancing process, i.e., we consider a static load balancing 
scenario. 
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SFB-376 and by the 1ST Program of the EU under contract numbers IST-1999-14186 
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A popular class of load balancing algorithms consists of diffusion schemes 
e.g. [1]. They work iteratively and each node migrates a load fraction (flow) 
over the topology’s communication links (edges), depending on the work load 
difference to its neighbors. Hence, these schemes operate locally and therefore 
avoid expensive global communication. Formally, if we denote the load of node 
i after k iterations by and the flow over edge {f,j} in step k by then 

Ve = {z,j} € E : 

and tuf = zcf-i - ^ (1) 



is computed where all aij satisfy the conditions described in the next Section. 
Equation 1 can be rewritten in matrix notation as where the 

matrix M is called the Diffusion Matrix of the network. This algorithm is called 
First Order Scheme (FOS) and converges to a balanced state computing the 
^ 2 -minimal flow. Improvements of FOS are the Second Order Schemes (SOS) 
which perform faster than FOS by almost a quadratic factor [2,3]. 

In [4], these schemes are extended to heterogeneous networks consisting of 
processors with different computing powers. In such an environment, computa- 
tions perform faster if the load is balanced proportionally to the nodes’ comput- 
ing speed sf. 



Wi 



E n 

i=l^i 

E n 

i=i 



Si . 



(2) 



Obviously, this is a generalization of the load balancing problem in homogeneous 
networks (si = 1) . Diffusion in networks with communication links of different ca- 
pacities are analyzed e.g. in [3,5]. It is shown that the existing balancing schemes 
can be generalized, such that roughly speaking faster communication links get a 
higher load migration volume than slower ones. The two generalizations can be 
combined as described in [4]. Heterogeneous topologies are extremely attractive 
because they often appear in real hardware installations containing machines 
and networks of different capabilities. In [6] it is demonstrated that FOS is also 
applicable to balance load in dynamic networks where edges fail from time to 
time or are present depending on the distance of the moving nodes. 

In contrast to the above implementations, work load in real world appli- 
cations usually cannot be divided arbitrarily often, but only to some extend. 
Thus, to address the load balancing problem properly, a more restrictive model 
is needed. The unit-size token model [2] assumes a smallest load entity, the unit- 
size token. Furthermore, work load is always represented by a multiple of this 
smallest entity. It is clear that load in this model is usually not completely bal- 
anceable. Unfortunately, as shown for FOS in homogeneous networks in e.g. [2, 
7,8], the use of integer values prevents the known diffusion schemes to balance 
the load completely. Especially if only a relatively small amount of tokens exists, 
a considerable load difference remains. 

To measure the balance quality in a network, usually two metrics are con- 
sidered. The ^ 2 -norm expresses how well the load is distributed on the pro- 
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cessors, hence the error in this norm ||e ||2 = (X)r=i(^f ~ should be 

minimized. One is even more interested in minimizing the overall computation 
time and therefore the load in lao-norm, which is equivalent to minimizing the 
maximal (weighted) load occurring on the processors. Recall, that if the error 
I |e| loo = max \w — Wi\ is less than b, then | |e ||2 is bounded by ^/ri ■ b. 

The problem of balancing indivisible tokens in homogeneous networks is ad- 
dressed e.g. in [9]. The proposed algorithm improves the local balance signifi- 
cantly, but still cannot guarantee a satisfactory global balance. The authors of [2] 
use a discretization of FOS by transmitting the flow yij = 

over edge {i,j} in step k. After k = 0 ((d/A 2 ) log(An)) steps, the error in ^ 2 - 
norm is reduced to O (ncP /X 2 ), where E = max^ \w^ — Wi\ is the maximal ini- 
tial load imbalance in the network, A 2 denotes the second smallest eigenvalue 
of the Laplacian of the graph (see next Section), and d is the maximal vertex 
degree. This bound has been improved by showing that the error in Zoo-norm 
is O ((d^/A 2 ) log(n)) after k = 0 {{d/\ 2 )log{En)) iterations. [7]. Furthermore, 
there exist a number of results to improve the balance on special topologies, e.g. 
[10,11,12]. The randomized algorithm developed in [8] reduces ||e ^||2 to 0{^/n) 
in k = 0{{d/\2) (log(A) -I- log(n) log(log(n)))) iteration steps with high proba- 
bility, if Wi exceeds some certain threshold. Since it is also shown that in Z 2 -norm 
0(i/n) is the best expected asymptotic result that can be achieved concerning 
the final load distribution [8], this algorithm provides an asymptotic optimal 
result. However, concerning the minimization of the overall computation time, 
the above algorithm only guarantees ||w^||oo <UH + 0(log(n)), even if fc — >■ 00 . 
Therefore, this algorithm does not achieve an optimal situation concerning the 
maximal overload in the system. 

In this paper, we present a modification of the randomized algorithm from 
[8] in order to minimize the overload in the system in l\, I 2 - and Zoo-norm, 
respectively. We show that this algorithm can also be applied in heterogeneous 
networks, achieving the corresponding asymptotic overload for the weighted li~, 
I 2 -, or Zoo-norms (for a definition refer to Section 3). The algorithm’s run-time 
increases slightly by at most a logarithmic factor and is bounded by 0 {{d/\ 2 ) ■ 
(log{E) log^(n))) with high probability, whenever wi exceeds some threshold. 
Note, that from our experiments we can conclude that the additional run-time 
is usually not needed. Furthermore, our results can be generalized to dynamic 
networks which are modeled as graphs with a constant node set, but a varying 
set of edges during the iterations. Similar dynamic networks have been used to 
analyze the token distribution problem in [13]. However, in contrast to diffusion, 
the latter only allows one single token to be sent over an edge in each iteration. 

The outline of the paper is as follows. In the next Section, we give an overview 
of the basic definitions and the necessary theoretical background. Then, we de- 
scribe the randomized strategy for heterogeneous networks and show that this 
algorithm minimizes the overload to the asymptotically best expected value 
whenever the heterogeneity obeys some restrictions. In Section 4, we present 
some experimental results and Anally Section 5 contains our conclusion. 
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2 Background and Definitions 

Let G = (U, E) be an undirected, connected, weighted graph with \V\ = n nodes 
and \E\ = m edges. Let aj € be the capacity of edge Cij € E, Si be the 
processing speed of node i £V , and Wi € K be its work load. In case of indivisible 
load entities, this value represents the number of unit-size tokens. 

Let A € be the Weighted Adjacency Matrix of G. As G is undirected, 

A is symmetric. Column/row i of A contains where j and i are neighbors in 
G. For some of our constructions we need the Lap la dan L G of G defined 
as L := D — A, where D G contains the weighted degrees as diagonal 

entries, e.g. Di i = j}£E and 0 elsewhere. 

The Laplacian L and its eigenvalues are used to analyze the behavior of 
diffusion in homogeneous networks. For heterogeneous networks we have to apply 
the generalized Laplacian LS~^, where S G denotes the diagonal matrix 

containing the processor speeds Si. We assume that 1 < < 0{n^) with 5 < 1. 

Generalizing the local iterative algorithm of Equation 1 in the case of arbi- 
trary divisible tokens to heterogeneous networks, one yields [4]: 



Here, N{i) denotes the set of neighbors oi i G V, and wf is the load of the 
node i after the fc-th iteration. As mentioned, this scheme is known as FOS and 
converges to the average load w of Equation 2 [4] . It can be written in matrix 
notation as = Mw^~^ with M = I — LS~^ G M contains Ci^jjsj at 

position (f,j) for every edge e = {i, j}, 1 — Se={i jIgb diagonal entry 
i, and 0 elsewhere. 

Now, let Ai, 1 < i < n be the eigenvalues of the Laplacian LS~^ in increasing 
order. It is known that Ai = 0 with eigenvector (si, . . . , s„) [14]. The values of 
Cij have to be chosen such that the diffusion matrix M has the eigenvalues 
— l<fj,i = l — Xi<l. Since G is connected, the first eigenvalue = 1 is simple 
with eigenvector (si, . . . , s„). We denote by 7 = max{|/i 2 |, |/r„|} < 1 the second 
largest eigenvalue of M according to absolute values and call it the diffusion 
norm of M . 

Several modifications to FOS have been discussed in the past. One of them 
is SOS [15], which has the form 



Setting P = 2/{l + results in the fastest convergence. 

As described in the introduction, we denote the error after k iteration steps 
by e^, where e^ = — w. The convergence rate of diffusion schemes in the case 

of arbitrary divisible tokens depends on how fast the system becomes e-balanced, 
i.e., the final error ||e ^||2 is less than e||e*^|| 2 . By using simple calculations and 
the results from [2,4], we obtain the following lemma. 




(3) 



= Mw°, + {1- fc = 2, 3, . . . 



(4) 
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Lemma 1. Let G he a graph and L its Laplacian, where Cij = l/(cmax{(ii, dj}), 
di is the degree of node i and c S (1,2] a constant. Let M = L — LS~^ he the 
diffusion matrix of G and set /? = 2/(1 + -\/l — 7^) . Then, FOS and SOS require 
6)((d/A2)-ln(smax/(eSmin))) and 6>(\/d/A2-ln(smax/(eSmin))) steps, respectively, 
to e-halance the system, where Smax = max, Si and = mini Sz- 

Although SOS converges faster than FOS by almost a quadratic factor and 
therefore seems preferable, it has a drawback. During the iterations, the outgoing 
flow of a node i may exceed Wi which results in negative load. On static networks, 
the two phase model copes with this problem [3]. The first phase determines 
the balancing flow while the second phase then migrates the load accordingly. 
However, this two phase model cannot be applied on dynamic networks because 
edges included in the flow computation might not be available for migration. 

Even in cases where the nodes’ outgoing flow does not exceed their load, we 
have not been able to show the convergence of SOS in dynamic systems yet. 
Hence, our diffusion scheme analysis on such networks is restricted to FOS. 

Since the diffusion matrix M is double stochastic, FOS can also be interpreted 
as a Markov process, where M is the corresponding transition matrix and the 
balanced load situation is the related stationary distribution. Using this Markov 
chain approach, the following lemma can be stated [16,17]. 

Lemma 2. Let G he a graph and we assume that a token performs a random 
walk on G according to the diffusion matrix M . Lf the values Si are in a range 
[l,/!"^], 5 € [0,1), then a constant a exists such that after a{{d ■ log(n))/A2) 
steps the prohahility pi of the token being situated on some node i is hounded hy 

(sz/e;=i - (iM < Pz < (sz/e;=i + (iM- 

3 The Randomized Algorithm 

In this Section, we describe a randomized algorithm that guarantees a final 
weighted overload of 0(1) in Zoo-norm for the load balancing problem of indivis- 
ible unit-size tokens in heterogeneous networks. 

First, we show that FOS as well as SOS do not guarantee a satisfactory 
balance in case of unit size tokens on inhomogeneous networks. As mentioned, 
the outgoing load from some node i may exceed wt when applying SOS. Hence, 
simply rounding down the flow to the nearest integer does not lead to the desired 
algorithm. Therefore, the authors of [2] introduced a so called ’I Owe You’ (lOU) 
unit on each edge. These lOUs represent the difference between the total flow 
moved along an edge in the idealized scheme and in the realistic algorithm. If 
the lOUs accumulate, then we can not prove any result w.r.t. the convergence 
of the adapted SOS. Nevertheless, in most of the simulations on static networks, 
even if the lOUs emerge, the number of owed units becomes very small after a 
few rounds. 

If we assume that the lOUs tend to zero after a few iterations, then we can 
use the techniques of [18] to state the following theorem. 
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Theorem 1. Let G = (V, E) be a node weighted graph, where Si is the weight 
of node i V , and let be the initial load distribution on G. Executing SOS 

on G, we assume that after a few iteration steps the lOUs are bounded by some 
constant on each edge of the graph. Then, after sufficient number of steps k, the 
error \\e ^\\2 is 0{{y/n ■ Smax ' rf)/(\/Smin ' (1~7))); where ^ is the diffusion norm 
of the corresponding diffusion matrix M . 

Proof. During the SOS iterations (Equation 4), we round the flow such that only 
integer load values are shifted over the edges. Hence, we get 

w^=pMw'^-^-{p-l)w'^-^ + 5k-i, (5) 



where Sk is the round-off error in iteration k. Comparing this to the balancing 
algorithm with arbitrary divisible load w'^ (Equation 4), we get an accumulated 
round-off Ok = — w' where Oq = 0 and Oi = i5o. We know that for any 

eigenvector Zi of M an eigenvector m of S~^I'^LS~^I'^ exists such that Zi = 
S^I’^Ui. We also know that the eigenvectors Zi form a basis in K" [4]. On the 
other hand, of = some (32, ■ ■ ■ , fn S K and any fc € N. Combing 

the techniques from [18] and [4], we therefore obtain 



|a/c||2 < 



k-l 

E 

i=o 







(l-r)2 



{k + ^ 

1 — r J ' 



where S is chosen such that II 2 < <5 for any j G {0,...,fc — 1} and r = y/ [3 — 1. 
Since 5 < y/n ■ d, the theorem follows. □ 

Using similar techniques, the same bound can also be obtained for EOS (without 
the restrictions). Since 1/(1 — 7 ) = 0{n^) and d < n — 1, the load discrepancy 
can at most be 0{n*) with t being small constant. 

We now present a randomized strategy, which reduces the final weighted load 
to an asymptotic optimal value. In the sequel, we only consider the load in loo~ 
norm, although we can also prove that the algorithm described below achieves 
an asymptotic optimal balance in li~ and / 2 -norm. 

In order to show the optimality, we have to consider the lower bound on the 
final weighted load deviation. Let be the weighted load of node i 

after k steps, where = n ■ Sj) is its normalized processing speed. 

Proposition 1. There exist node weighted graphs G = (V,E) and initial load 
distributions w, such that for any load balancing scheme working with indivisible 
unit-size tokens the expected final weighted load in laa-norm, limfc_>.oo ||ic^||, is 
w~i/s~i 17(1). 



To see this, let G be a homogeneous network (si = 1 for i G U) and let 
wf = b' b, where /»' G N and b G (0, 1). Obviously, this fulfills the proposition. 

The following randomized algorithm reduces the final load ||ic^||oo to 
0(1) in heterogeneous networks. We assume that the weights Si are lying in the 
range [ 1 , n^] with 5 < 1 , and that the number of tokens high enough. 
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Note that the results below can also be obtained if these assumptions do not hold, 
however, for simplicity we prove the results under these conditions. Although we 
mainly describe the algorithm for static networks, it can also be used on dynamic 
networks, obtaining the same bound on the final load deviation. 

The algorithm we propose consists of two main phases. In the first phase, 
we perform FOS as described in [4] to reduce the load imbalance in /i-norm to 
0{n*), where t is a small constant value. At the same time, we approximate Wi 
within an error of ±l/n by using the diffusion algorithm for arbitrarily divisible 
load. In the second phase, we perform 0(log(n)) so called main random walk 
rounds (MRWR) described in the following. 

In a preparation step, we mark the tokens that take part in the random 
walk and also introduce participating “negative tokens” that represent load de- 
ficiencies. Let Wi be the current (integer) load and R be the set of nodes with 
Ji < 1/3. If a node i € R contains less than tokens, we place [tufj — Wi 
“negative tokens” on it. If i S R owns more than tokens, we mark Wi — [wij 
of them. On a node i G V \ R with less than [wl] -|- |"2sf] tokens, we create 
[wi] -I- — Wi “negative tokens”. Accordingly, iff G V\R contains more than 

[thi] -I- [2^] tokens, we mark Wi — ( |"uF] -I- [2^] ) of them. Obviously, the number 
of “negative tokens” is larger than the number of marked tokens by an additive 
value of I7(n). Now, in each MRWR all “negative tokens” and all marked tokens 
perform a random walk of length {ad/X 2 ) ln(n) according to M, where a is the 
same constant as in Lemma 2. Note, that if a “negative token” is moved from 
node i to node j, a real token is transferred in the opposite direction from node 
j to node i. Furthermore, if a “negative token” and a marked token meet on 
a node, they eliminate each other. We now show that one MRWR reduces the 
total number of marked tokens that produced the overload in the system by at 
least a constant factor c. 

From the description of the algorithm immediately follows that the number 
of iteration steps is O ((d/A 2 )(log^(u) -I- log(if))). In the remaining part of this 
Section we prove the correctness of the algorithm. In contrast to the above 
description, we assume for the analysis that “negative tokens” and marked tokens 
do not eliminate each other instantly, but only after completing a full MRWR. 
This modification simplifies the proof, although the same bounds can also be 
obtained for the original randomized algorithm. 

Lemma 3. Let G = (V,E), \V\ = n be a node weighted graph. Assume that qn 
tokens are exeeuting random walks, aecording to G ’s diffusion matrix, of length 
(adliffn)) / X 2 , q > 321n(n), where a is the same constant as in Lemma 2. Then, 
a constant c exists, such that, after completion of the random walk, the number 
of tokens producing overload is less than cni/qln{n) with probability 1 — o(l/n). 

Lemma 4. Let G = (V,E), \V\ = n be a node weighted graph. Assume that qn 
tokens are executing random walks, according to G ’s diffusion matrix, of length 
(adliffn)) / X 2 , where a is the same constant as in Lemma 2. Lf q is larger than 
some certain constant, then, a constant c > 4 exists, such that, after completion 
of the random walk, the number of tokens producing overload is less than qn/ c 
with probability 1 — o(l/n). 
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The proofs of the according Lemmas for homogeneous topologies can be found 
in [8] . Using some additional techniques we can generalize them to heterogeneous 
graphs. Due to space limitations, we omit them here. 

Lemma 5. Let G = {V,E),\V\ = n he a node weighted graph and let q\n he 
the number of marked tokens and q 2 U be the number of “negative tokens” which 
perform random walks of length (adhi(n)) / X 2 on G, respectively. Ifqi,q 2 = 0(1) 
with q 2 n — qin = f2(n) and q\ > ln(n)/n, then a constant c > 1 exists, such that, 
after completion of the random walk, the number of tokens producing overload is 
less than \qin/c\ with probability 1 — o(l/n). 

Proof. First, we describe the distribution of the “negative tokens” on G’s nodes. 
Obviously, with probability at least 1/2 a token is placed on a node i with ^ > 
1/2. Using the Chernoff bound [19], it can be shown that with probability 1 — 
o(l/n^) at least g2''^(l — o(l))/2 tokens are situated on nodes with ^ > 1/2 after 
a MRWR. With help of the Chernoff bound we can also show that for any node 
i with > o' ln(n) and d being a proper constant, with probability 1 — o(l/n^) 
there are at least qpsll^ tokens on i after the MRWR. If G contains less than 
c'ln(n) nodes with normalized processing speed c'ln(n)/2l < sf < c'ln(n)/2l“^ 
for some j S {1, . ■ ■ , log(c' ln(n)) + 1}, then a token lies on one of these nodes 
with probability 0{lii^{n)/n). Otherwise, if there are Q > c'ln(n) nodes with 
speed c'ln(n)/2l < ^ < c' ln(n)/2l“^, then at least Q/pj of these nodes contain 
more than ]q2c' ln(n)/2l+^] tokens, where pj = max{|’2/(q2c'ln(n)/2l)],2}. 

Now we turn our attention to the distribution of the marked tokens after 
an MRWR. Let q\n > %Jn. Certainly, w.h.p. q\n{l — o(l))/2 tokens reside on 
nodes with ^ > 1/2. On a node i with normalized processing speed ^ > c'ln(n) 
are placed at most gi^(l + o(l)) + 0(ln(n)) tokens. Therefore, most of the 
marked tokens on such heavy nodes will be eliminated by the “negative tokens” . 
Let Sj be the set of nodes with c'ln(n)/2l < ~sl < c' ln(n)/2l“^, where j € 
{1, . . . ,log(c'ln(n))}. We ignore the tokens in sets Sj with jS'^j < -y/n, since 
their number adds up to at most 0(i/Qiuln^(n)). Each of the other sets Sj with 
|5'j| > -y/n, contains ns = f2{y/qin) marked tokens distributed nearly evenly. 
Therefore, a constant d' exists so that at least nsjd' of them are eliminated by 
“negative tokens”. If gin < ^Jn, then after the MRWR it holds that a marked 
token lies on some node i with normalized speed ^ > 1/2 with a probability of 
at least 1/2. If one of these tokes is situated on some node i with ~sl > c'ln(n), 
then it is destroyed by some “negative token” with high probability. If a marked 
token resides on some node of the sets Sj, where j € {1, . . . , log(c' ln(n))}, then 
a constant c/ > 0 exists such that this token is destroyed with probability 1/c/. 
Therefore, a marked token is destroyed with probability of at least l/(2c/). 

Summarizing, the number of marked tokens is reduced to after (H(ln(n)) 
MRWRs. Additional 0(ln(n)) MRWRs decrease this number to at most 
(!I(ln(n)). These can be eliminated within another 0(ln(n)) MRWRs with prob- 
ability 1 — o(l/n). □ 

Combining the Lemmas 3, 4, and 5 we obtain the following theorem. 
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Theorem 2. Let G = (V, E) he a node weighted graph with maximum vertex 
degree d and let A 2 be the second smallest eigenvalue of the weighted Laplacian 
of G. Furthermore, let be the initial load and E = maxi |w° — Wi| the initial 
maximal load imbalance. If X^r=i ^ exceeds some certain threshold, then the 
randomized algorithm reduces the weighted load in l^o-norm, ||w^'||oo to wi/si + 
0(1) in k = 0{{d/ \ 2 ){log{E) + log^(n))) iteration steps with probability 1 — 
o(l/n). 

In the previous analysis we assume that the tokens are evenly distributed on 
the graphs’ nodes after each MRWR. However, after the marked and “negative 
tokens” have eliminated each other, the distribution of the remaining entities no 
longer corresponds to the stationary distribution of the Markov process. For the 
analysis, we therefore perform another MRWR to regain an even distribution. 
Indeed, the tokens are not evenly distributed after the elimination step, but they 
are close to the stationary distribution in most cases. Hence, the analysis is not 
tight what can also be seen in the experiments presented in the next Section. 
Using the techniques of [20], we could improve the upper bound for the runtime 
of the algorithm for special graph classes. However, we do not see how to obtain 
better run-time bounds in the general case. 

Similar results as presented can also be obtained w.r.t. the l\ and / 2 -norm. 
Moreover, the results can also be generalized to dynamic networks. Dynamic 
networks can be modeled by a sequence {Gi)i>o of graphs, where Gi is the 
graph which occurs in iteration step i [13]. Let us assume that any graph Gi 
is connected and let h > ^ / G ^ 2 ) ^ G N, where C is a 

large constant, represents the maximum vertex degree and A 2 denotes the 
second smallest eigenvalue of Gfs Laplacian. Then, the randomized algorithm 
reduces Uw^Uoo to Wi/Jl + 0(1) in fc = O [h{log{E) + log^(n))) iteration steps 
(with probability 1 — o(l/n)). 

4 Experiments 

In this section we present some of the experimental results we obtained with 
our implementation of the approach proposed in the last Section. Our tests are 
performed in two different environments. The first one is based on common 
network topologies and therefore mainly considers static settings. However, by 
letting each graph edge only be present with a certain probability p in each 
iteration, it is possible to simulate e.g. edge failures and therefore dynamics. 
The second environment is taken from the mobile ad-hoc network community. 
Here, nodes move around in a simulation space and a link between two nodes is 
present if their distance is small enough. 

The algorithm works as described in Section 3 and consists of two main 
phases. First, we balance the tokens as described in [2] only sending the integer 
amount of the calculated flow. Simultaneously and independently, we simulate 
the diffusion algorithm with arbitrarily divisible load. Note, that in both cases we 
adapt the vertex degree in non static networks in every iteration to improve the 
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Fig. 1. Balancing tokens on the 16 x 16 Torus. 



convergence as shown in [6] . The additional simulation leads to an approximation 
of the fully balanced load amount. If its error is small enough, the second phase 
of the algorithm is entered. Each node now calculates its desired load based on 
its approximated and its current real integer load situation as it is described 
in Section 3. Then, the excess load units and the virtual negative tokens start 
performing a random walk until the error between the desired load and the real 
load becomes small enough. 

The network types included in our tests are general interconnection topologies 
like Butterfly, Cube Connected Cycle, de Bruijn, Hypercube, Shuffle- Exchange, 
Grid, and Torus networks. All of these topologies are well known and some of 
them are designed to have many favorable properties for distributed computing, 
e.g. a small vertex degree and diameter and large connectivity. 

Figure 1 demonstrates the algorithm’s behavior on the 16 x 16 Torus. In 
this setting, all 16^ token are placed on a single node and an edge fails with 
10% probability. Furthermore, the nodes processing speed varies from 0.8 to 
1.2. Applying EOS, the maximal weighted load (left) is initially reduced quickly, 
while in the end many iterations for only small improvements are needed. At 
some point, due to the rounding, no further reduction will be possible. This 
is also visible when looking at the I 2 error (right) which stagnates at around 
200.0. In contrast, the error of the arbitrarily divisible load simulation converges 
steadily. Hence, the proposed load balancing algorithm switches to random walk 
in iteration 277. One can see (left) that this accelerates the load distribution 
and leads also to a smaller maximal weighted load than EOS can ever achieve. 
Note, that the optimal load for each node is not exactly known when switching. 
Hence, the error of the unit-size token distribution cannot exceed the one of the 
approximated solution (right). 

Detailed presentations of experiments based on different graphs classes and 
with varying parameters have to be omitted due to space limitations. Neverthe- 
less, the results are very similar to the given example, even for graphs that are 
very well suited for diffusion like the Hypercube. In general, increasing the edge 
failure probability slows down the algorithm. However, it slightly improves the 
balance that can be achieved with diffusion, because the average vertex degree is 
smaller and therefore the rounding error decreases. Furthermore, we observe that 
the number of additional iterations needed is much smaller than expected re- 
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Fig. 2. Balancing tokens in a mobile ad-hoc network. 



garding the theoretical increase of the bound through the additional logarithmic 
factor. 

The second environment we use to simulate load balancing is a mobile ad- 
hoc network (Manet) model. The simulation area is the unit-square and 256 
nodes are placed randomly within it. Prior to an iteration step of the general 
diffusion scheme, edges are created between nodes depending on their distance. 
Here, we apply the disc graph model (e.g. [21]) which simply uses a uniform 
communication radius for all nodes. After executing one load balancing step, all 
nodes move toward their randomly chosen way point. Once they have reached it, 
they pause for some iterations before continuing to their next randomly chosen 
destination. This model has been proposed in [22] and is widely applied in the 
ad-hoc network community to simulate movement. Note, that when determining 
neighbors as well as during the movement, the unit-square is considered to have 
wrap-around borders, meaning that nodes leaving on one side of the square 
will reappear at the proper position on the other side. For the experiments, we 
average the results from 25 independent runs. 

The outcome of one experiment is shown in Figure 2. The nodes move quite 
slowly with speed between 0.001 and 0.005, pause for 3 iterations when they 
have reached their destination, and their communication radius is set to 0.1. In 
contrast to the former results, the load can be better balanced when applying 
FOS only. This is due to the vertex movement which is an additional, and in later 
iterations the only mean to spread the tokens in the system. Higher movement 
rates will increase this effect. Nevertheless, the randomized algorithm is again 
able to speed up the load balancing process reaching an almost equalized state 
much earlier. 



5 Conclusion 

The presented randomized algorithm balances unit-size tokens in general hetero- 
geneous and dynamic networks without global knowledge. It is able to reduce the 
weighted maximal overload in the system to 0(1) with only slightly increasing 
the run-time by at most a factor of 0(ln(n)) compared to the general diffusion 
scheme. From our experiments, we can see that this additional factor is usually 
not needed in practice. 
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Abstract. We study polynomials of degree up to 4 over the rationals 
or a computable real subfield. Our motivation comes from the need to 
evaluate predicates in nonlinear computational geometry efficiently and 
exactly. We show a new method to compare real algebraic numbers by 
precomputing generalized Sturm sequences, thus avoiding iterative meth- 
ods; the method, moreover handles all degenerate cases. Our first con- 
tribution is the determination of rational isolating points, as functions 
of the coefficients, between any pair of real roots. Our second contribu- 
tion is to exploit invariants and Bezoutian subexpressions in writing the 
sequences, in order to reduce bit complexity. The degree of the tested 
quantities in the input coefficients is optimal for degree up to 3, and for 
degree 4 in certain cases. Our methods readily apply to real solving of 
pairs of quadratic equations, and to sign determination of polynomials 
over algebraic numbers of degree up to 4. Our third contribution is an im- 
plementation in a new module of library SYNAPS v2.1. It improves signif- 
icantly upon the efficiency of certain publicly available implementations: 
Rioboo’s approach on AXIOM, the package of Guibas-Karavelas-Russel 
[11], and CORE vl.6, maple v9, and synaps v2.0. Some existing limited 
tests had shown that it is faster than commercial library leda v4.5 for 
quadratic algebraic numbers. 



1 Introduction 

Our motivation comes from computer-aided geometric design and nonlinear com- 
putational geometry, where predicates rely on real algebraic numbers of small 
degree. These are crucial in software libraries such as esolid [15], EXACUS(eg. 
[12], [2]), and CGAL(eg. [8]). Predicates must be decided exactly in all cases, in- 
cluding degeneracies. We focus on real algebraic numbers of degree up to 4 and 
polynomials in one variable of arbitrary degree or in 2 variables of degree < 2. 
Efficiency is critical because comparisons on such numbers lie in the inner loop 
of most algorithms, including those for computing the arrangement of algebraic 
curves, arcs or surfaces, the Voronoi diagrams of curved objects, eg. [2,7,12,14] 
and kinetic data-structures [11]. 

Our work is also a special-purpose quantifier elimination method for one 
or two variables and for parametric polynomial equalities and inequalities of 
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low degree. Our approach extends [16,22] because our rational isolating points 
eliminate the need of multiple sign evaluations in determining the sign of a 
univariate polynomial over an algebraic number of degree < 4. We also extend 
the existing approaches so as to solve some simple bivariate problems (sec. 7). 

Our method is based on pre-computed generalized Static sequences; in other 
words, we implement straight-line programs for each comparison Finding isolat- 
ing points of low algebraic degree (and rational for polynomials of degree < 4) is 
a problem of independent interest. It provides starting points for iterative algo- 
rithms and has direct applications, e.g. [7]. Our Sturm-based algorithms rely on 
isolating points in order to avoid iterative methods (which depend on separation 
bounds) and the explosion of the algebraic degree of the tested quantities. In 
order to reduce the computational effort, we factorize the various quantities by 
the use of invariants and/or by the elements of the Bezoutian matrix; for our 
implementation, this is done in an automated way. 

We have implemented a package of algebraic numbers as part of the 
SYNAPS 2.1 library [6] and show that it compares favorably with other soft- 
ware. Our code can also exist as a stand-alone C++ software package. We 
call our implementation which stands for Static Sturm Sequences (or 
Salmon-Sturm-Sylvester) . 

The following section overviews some of the most relevant existing work. 
Next, we formalize Sturm sequences. Sect. 4 studies discrimination systems and 
its connection to the invariants of the polynomial. Sect. 5 obtains rational iso- 
lating points for degree < 4. Sect. 6 bounds complexity and sect. 7 applies our 
tools to sign determination and real solving and indicates some of our techniques 
for automatic code generation. Sect. 8 illustrates our implementation with ex- 
perimental results. We conclude with future work. 



2 Previous Work and Contribution 

Although the roots of rational polynomials of degree up to 4 can be expressed 
explicitly with radicals, the computation of the real roots requires square and 
cubic roots of complex numbers. Even if only the smallest (or largest) root is 
needed, one has to compute all real roots (cf [13]). Another critical issue is that 
there is no formula that provides isolating rational points between the real roots 
of polynomials: this problem is solved in this paper for degree < 4. 

In quantifier elimination, an effort was made to optimize low level, opera- 
tions, eg. [16,22]. However, by that approach, there is a need for multiple Sturm 
sequences. By our approach, we need to evaluate only one Sturm sequence in 
order to decide the sign of a polynomial over a cubic or quartic algebraic number. 

Our discrimination system for the quartic is the same as that in [23], but is 
derived differently and corrects a small error in [23] . 

Rioboo implemented in axiom an arithmetic of real algebraic numbers of 
arbitrary degree with coefficients from a real closed field [17]. The extension he 
proposed for the sign evaluation is essentially based upon theorem 2. 
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Table 1. Computing the arrangement of 300 circular arcs with: random full circles 
(rfc), circles centered on the crosspoints and the centers of the cells of a square grid 
of cell size 10"^ (g), the same perturbed (pg), random monotone arcs (rma), random 
non- monotone arcs (rnma). All the geometry code was done in CGAL and so the first 
column shows the arithmetic used. 



sec 


rfc g pg 


rma 


rnma 




137 2 5 


10 


34 


CORE vl . 6 


- 10 17 


20 


195 


LEDA v4 . 5 


374 5 11 


19 


104 



Iterative methods based on the approach of Descartes / Uspensky seem to 
be the fastest means of isolating real roots, in general (cf. [18]). Such methods 
are implemented in SYNAPS. An iterative method using Sturm sequences, has 
been implemented in [11]. Both methods are tested in sec. 8. 

LEDA and CORE^ evaluate expression trees built recursively from integer op- 
erations and and rely on separation bounds, leda treats arbitrary algebraic 
numbers, by the diamond operator, based on Descartes/Uspensky iteration. But 
it faces efficiency problems ([20]) in computing isolating intervals for degree 3 
and 4, since Newton’s iteration cannot always be applied with interval coeffi- 
cients. CORE recently provided for dealing with algebraic numbers using Sturm 
sequences. Currently this operator cannot handle multiple roots. 

Precomputed quantities for the comparison of quadratic algebraic numbers 
were used in [5], derived from the u-resultant and Descartes’ rule of sign. In [14], 
the same problem was solved with static Sturm sequences, thus improving upon 
the runtime by up to 10%. In generalizing these methods to higher degree, it is 
not obvious how to determine the (invariant) quantities to be tested in order to 
minimize the bit complexity. Another major issue is the isolating points as well 
as the need of several Sturm sequences. 

The contribution of this paper starts with quadratic numbers, for which we 
reformulated the existing method in order to make it generalizable to higher 
degree [10]. The efficiency of our implementation for quadratic numbers is illus- 
trated in [8]; we copy table 1 from this paper. For algebraic numbers of degree 3 
and 4, preliminary results are in [10,9], where more details can be found, which 
cannot fit here for reasons of space. Our novelty is to use a single Sturm se- 
quence for degree up to 4 (see cor. 4), first by considering the discrimination 
system of the given polynomial for purposes of root classification and in or- 
der to derive a square-free polynomial defining the algebraic number, second by 
deriving rational isolating points (theorems 6 and 10) and finally by reducing 
the computational effort through factoring of the tested quantities using invari- 
ants and elements of the Bezoutian matrix. Our implementation computes these 
quantities in an automatic way. 

^ http:/ /www. algorithmic-solutions.com/enleda.htm, 
http:/ /www. cs.nyu.edu/exact/core 
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3 Sturm Sequences 

Sturm sequences is a well known and useful tool for isolating the roots of any 
polynomial (cf [24], [1]). Additionally, the reader can refer to [14] where Sturm 
sequences are used for comparing algebraic numbers of degree 2, or to [5] where 
the comparison of such numbers was done by the resultant. In the sequel D is 
a ring, Q is its fraction field and Q the algebraic closure of Q. Typically D = Z 
and Q = Q. 

Definition 1. Let P and Q S D[a^] be nonzero polynomials. By a (general- 
ized) Sturm sequence for P, Q we mean any pseudo-remainder sequenee P = 
(Po, Pit ■ ■ , Pn), n>l, sueh that for all i = 1 , . . . ,n, we have OiPi-i = QiPi -\- 
biPi+i {Qi G D[a:], a^, bi G D), sueh that Oibi < 0 and Pq = P,Pi = Q, P„-i-i = 0. 
We usually write Ppa,p^ if we want to indieate the first two terms in the sequence. 

For a Sturm sequence P, V-p{p) denotes the number of sign variations of the 
evaluation of the sequence at p. The last polynomial in Pp^^p.^ is the resultant 
of Po and P\. 

Theorem 2. Let P, Q G D[ x] be relatively prime polynomials and P square-free. 
If a < b are both non-roots of P and 7 ranges over the roots of P in [a,b], then 

Vp^Q[a,b] := Vp^q{o) - Vp^q{b) = ^sign(P (7)Q(7))- 

7 



where P is the derivative of P. 

Corollary 3. Theorem 2 holds if in place of Q we use R = PRem(Q,P), where 
PRem(Q,P), stands for the pseudo-remainder of Q divided by P. 

Finding isolating intervals for the roots of any polynomial is done be- 
low. Still, the computation of a Sturm sequence is quite expensive. In or- 
der to accelerate the computation we assume that the polynomials are 
P, (5 G D[flo, . . . , a„, 6 q, . . . , 6m][a^]) where ai,bj are the coefficients considered 
as parameters, and we pre-compute various Sturm sequences {n,m < 4). 

The isolating-interval representation of real algebraic number a G Q is a = 
(A{X),I), where A{X) G D[A1] is square-free and A{a) = 0, I = [a, 6], a, 6, G Q 
and A has no other root in I . 

Corollary 4. Assume B{X) G D[A1] : /3 = B{a), and that a = (A, [a, 6]). By 
theorem 2, sign(P(a)) = sign{VA, B[a,b] ■ A (a)). 

Let us compare two algebraic numbers 71 = {Pi{x),Ii) and 72 = (P2(a;),/2) 
where Ii = [ai,bi] and I2 = [02, ^>2]. Let J = Ji fl I2. When J = 0, or only one 
of 7i and 72 belong to J, we can easily order the 2 algebraic numbers. All these 
tests are implemented by the previous corollary and theorem. If 71, 72 G J, then 
7i > 72 P2{li) ■ P2{l2) > 0. We can easily obtain the sign of P2{l2), and from 
theorem 2, we obtain the sign of P2(7i). 
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4 Root Classification 



Before applying the algorithms for root comparison, we analyze each polynomial 
by determining the number and the multiplicities of its real roots. For this, we 
use a system of discriminants. For the quadratic polynomial the discrimination 
system is trivial. For the cubic, it is well known [22,10]. We study the quartic by 
Sturm-Habicht sequences, while [23] used a resultant-like matrix. For background 
see [24,1]. We factorize the tested quantities by invariants and elements of the 
Bezoutian matrix. We use invariants in order to provide square-free polynomials 
defining the algebraic numbers, to compute the algebraic numbers as rationals 
if this is possible and finally to provide isolating rationals. 

Consider the quartic polynomial equation, where a > 0. 

f{X) = aX'^-4bX^ + 6cX'^-4dX + e. (1) 

In the entire paper, we consider as input the coefficients a,b,c,d,e G D. 

For background on invariants see [4], [19]. We consider the rational invariants 
of /, i.e the invariants in GL{2,Q). They form a graded ring [4], generated by: 

A=Wa + 3A3, B = -dWi - eA 2 - cAs. (2) 

Every other invariant is an isobaric polynomial in A and B, i.e. it is homogeneous 
in the coefficients of the quartic. We denote the invariant A^ — 27B^ by Ai 
and refer to it as the discriminant. The semivariants (which are the leading 
coefficients of the covariants) are A,B and: 

A 2 = b^ - ac, R = aWi + 2 bA 2 ,Q = 12 A^ - a^A. (3) 



We also derived the following quantities, which are not necessarily invariants 
but they are elements of the Bezoutian matrix of / and / . Recall that the 
determinant of the Bezoutian matrix equals the resultant, but its size is smaller 
than the Sylvester matrix [3]. 



As = — bd 

Ai = d^ — ce 
Wi = ad — be 
W 2 = be — cd 



T = -9Wf -k 27 A 2 A 3 - 3W3Z\2 
Ti = -W3A2-3Wi + 9 A 2 A 3 
T 2 = AWi -9bB 
W 3 = ae — bd 



(4) 



Proposition 5. [23] Let f{X) be as in (1). The table gives the real roots and 
their multiplicities. In case (2) there are 4 complex roots, while in case ( 8 ) there 
are 2 complex double roots (In [23] there is a small error in defining T). 



(1) z\i > 0 A r > 0 A ZI 2 > 0 


{1,1,1,!} 


(2) Z\i > 0 A (T < 0 V Z\2 < 0) 


{} 


(3) Z\i < 0 


{1,1} 


(4) Z\i = 0 A r > 0 


{2,1,1} 


(5) = OAT < 0 


{2} 


(6)Z\i = 0AT = 0AZ\2>0Ai? = 


0{2,2} 


(7) = 0 A T = 0 A Z\2 > 0 A i? yf 0 {3, 1} 


(8) Z\i = 0 A T = 0 A Z\2 < 0 


{} 


(9) = 0 A T = 0 A Z\2 = 0 


{4} 
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5 Rational Isolating Points 

In what follows, / G and a > 0; the same methods work for any computable 
real subfield D. It is known that for the quadratic f{X) = aX^ — 2b X + c, the 
rational number isolates the real roots. 

Theorem 6. Consider the cubic f{X) = a X^ — 3b X^ + 3 cX — d. The rational 
numbers - and isolate the real roots. 

Proof. In [10], we derive the rational isolating points based on the fact that the 
two extreme points and the inflexion point of f{X) are colinear. Moreover, the 
line through these points intersects the a;— axis at a rational number. Notice that 
the algebraic degree of the isolating points is at most 2. Interestingly, the same 
points are obtained by applying theorem 7. □ 



Theorem 7. [21] Given a polynomial P{X) with adjacent real roots 71 , 72 , und 
any two other polynomials B{X), C{X), let A{X) := B{X)P' {X) + C{X)P{X) 
where P' is the derivative of P. Then A{X) or B{X) are called isolating polyno- 
mials because at least one of them has at least one real root in the closed interval 
[ 71 , 72 ]. In addition, it is always possible to have deg A -b deg S < degP— 1. 



We now study the case of the quartic. By theorem 7 it is clear how to isolate 
the roots by 2 quadratic algebraic numbers and a rational. In order to obtain an 
isolating polynomial, let B{X) = ax — b and C{X) = —4a then 

A{X) = 3A2X‘^ + 3WiX -W^. (5) 



Since - is the arithmetic mean of the 4 roots, it is certainly somewhere between 
the roots. The other two isolating points are the solutions of (5), i.e 



CTi^2 



-3Wi ± -b 12A2m 

6A2 



We verify that sign (/(f)) = sign (a^A — so 



CTi < f < ct 2 , if /(p > 0 ; 
ai <fX 2 < ^if/(|) <0 A i?>0; 
^ < (Ti < (72, if /(^) <0 A i? < 0; 



(6) 



(7) 



where i? is from (3). If /(^) = 0 then we know exactly one root and can express 
the other three roots as roots of a cubic. To obtain another isolating polynomial, 
we use B(X) = dx — e, C(X) = —4d, and 

A(X) = W3X^-3W2X^-3A4 X. 



By the theorem at least 2 of the numbers below separate the roots. 

3W2 ± s/mlpY 2 AWs 

■ 



0,n,2 



(8) 
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We assume that the roots are > 0, so 0 is not an isolating point. The order of 
the isolating points is determined similarly as in (7). 

Let us now find rational isolating points for all relevant cases of prop. 5. 

{1, 1, 1, 1} Treated below. 

{1, 1} Since {1, 1, 1, 1} is harder, we do not examine it explicitly. 

{2, 1, 1} The double root is rational since it is the only root of GCD(/, /') and its 
value is see eq (4). In theory, we could divide it out and use the isolating 
points of the cubic, but in practice we avoid division. When the double root 
is the middle root then ^ and are isolating points, otherwise we use 

theorem 7 to find one more isolating point in Q. 

{2} Compute the double root from Pfj’’-, it is rational as a root of GCD(/, /'). 
{2,2} The roots are the smallest and largest root of the derivative i.e. a cubic. 

Alternatively, we express them as the roots of 3A2X^ + 3WiX — W 3 . 

(3, 1} The triple root is — and the single root is ^ ' 

{4} The one real root is ^ G Q. 

It remains to consider the case where the quartic has 4 simple real roots. We 
assume that 0 is not a root (otherwise we deal with a cubic), therefore, e yf 0 . 
WLOG, we may consider equation (1) with 6=0. Then, specialize equations 
( 6 ) and ( 8 ) using 6 = 0. The only difficult case is when n and aj, i,j € {1,2}, 
isolate the same pair of adjacent roots. WLOG, assume that these are ti, ui. We 
combine them by the following lemma. 

Lemma 8. For any m,n,m' ,n' e N* , 0 < ^ < K ^ < K- 

^ ’ n n' n n+n' n' 

A := 9 A 4 — See, B := 12 aeZ \4 + 9 (fc^ (9) 

then, an isolating point is If we find an integer K G [VA, VB], 

then it suffices to replace VA + VB by 2K and we denote the resulting rational 
by fTi ©Tj; notice it has degree 2 in the input coefficients. By prop. 5(1), Z \2 > 0 
c < 0. Descartes’ rule implies that, if e > 0, then there are 2 positive and 2 
negative roots, while e < 0 means there are 3 positive and one negative root or 
vice versa. We set K = |"\/(4] to prove theorem 10, provided the following holds: 

Theorem 9. For every quartic in Z[A'] with 4 distinct real roots and 6 = 0, we 
have \/B ~ ^fA > 1, using notation (9). 

Proof. 

/g>l + VA<t^Y^>l+^^yj >2 

g := Aaed'^ — 4ace^ + 3d^c^ — 12 d^ + 16ce > 0 . 

First we show that the minimum of g(a, c, d, e) is positive, subject to —a < 1, 
c < —5, and — e < —5; we treat the case where c > — 5 and e < 5 later. We 
introduce slack variables j/i, 2 / 2 , 2/3 and use Lagrange multipliers. So our problem 
now is 

min L(a,c,d,e, 2 /i, 2 / 2 , 2 / 3 , Ai, A 2 , A 3 ) := . 

min [g(c, e) + Ai(c + 2 /? + 5 )A 2 (-e + 2 /i + 5) + A 3 (-a + 2 /| + 1)] 
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We take partial derivatives, equate them to zero and the solution of the system, 
by MAPLE 9, is {a,c,d,e) = (1,— 5,0,5) and g(l,— 5,0,5) = 300 > 0 which is 
a local minimum. If —5 < c < 0 and 0 < e < 5 we check exhaustively that 
^/B — VA > 1. If e < 0 then we use again Lagrange multipliers but with the 
constraint e + 1 — t/| . □ 

Theorem 10. Consider a quartic as in (1). At least three of the rational num- 
bers {0, CTi 0 Tj} isolate the real roots, i,j S {1, 2}. 

6 Complexity of Computations 

The comparison of the roots of two polynomials of degree d < 4 using Sturm 
sequences and isolating intervals provides the following bounds in the degree 
of the tested quantities. We measure degree in terms of the input quartics’ 
coefficients. A lower bound is the degree of the resultant coefficients, which is 
2 d in terms of the input coefficients. Recall that the resultant is an irreducible 
polynomial in the coefficients of an overconstrained system of n + 1 polynomials 
in n variables, the vanishing of which is the minimum condition of solvability of 
the system ([24], [1]). 

There is a straightforward algorithm for the comparison of quadratic alge- 
braic numbers, with maximum algebraic degree 4, hence optimal ([14], [5]). 

Theorem 11. [10] There is an algorithm for the eomparison of algebraic cu- 
bic numbers (including all degenerate cases), with maximum algebraic degree 6, 
hence optimal. 



Theorem 12. There is an algorithm that compares any two roots of two square- 
free quartics with algebraic degree 8 or 9, depending on the degree of the isolating 
points. When the quartics are not square-free, the algebraic degree is between 8 
and 13. The algorithm needs at most 172 additions and multiplications in order 
to decide the order. These bounds cover all degenerate cases, including when one 
polynomial drops degree. 

Proof. (Sketch) In [10] we derive this bound by considering the evaluation of 
the Sturm sequence polynomials over the isolating points (see the discussion 
after cor. 4 for additional details). The isolationg points have degree at most 2. 
The length of the sequence is at most 6 and so we need at most 12 polynomial 
evaluations, hence a fixed number of operations. The number of operations is 
obtained by maple’s function cost. □ 

7 Applications 

We have implemented a software package, including a root_of class for algebraic 
numbers of degrees up to 4 as part of library SYNAPS (v2.1) [6]. Some function- 
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alities pertain to higher degrees. Our implementation is generic in the sense that 
it can be used with any number type and any polynomial class that supports 
elementary operations and evaluations. It can handle all degenerate cases and 
has been extended to arbitrary degree (though without isolating points for now). 
We developed programs that produce all possible sign combinations of the tested 
quantities, allow us to test as few quantities as possible, and produce both C++ 
code and pseudo-code for the comparison and sign determination functions. We 
provide the following functionalities, where UPoly and BPoly stand for arbitrary 
univariate and quadratic bivariate polynomial respectively: 

SturmCUPoly /i , UPoly /2) Compute the Sturm sequence of /i,/2, by 
pseudo-remainders, (optimized) subresultants, or Sturm-Habicht. 

compare (root_of a, root_of ( 3 ) Compare 2 algebraic numbers of degree 
< 4 using precomputed Sturm sequences. We have precomputed all the possible 
sign variations so as to be able to decide with the minimum number of tests (and 
faster than with Sturm-Habicht sequences). 

sign_at (UPoly /, root_of a) Determination of the sign of a univariate 
polynomial of arbitrary degree, over an algebraic number of degree < 4 . We use 
the same techniques, as in compare. 

sign_at (BPoly /, root_of , root_of jy) 71 = (Pi(x),/i) and 72 = 
{P2{x),l2), where Ii = [ai, 61] and I2 = [02, ^2]) are of degrees < 4 . We compute 
the Sturm-Habicht sequence of P\ and / wrt X. We specialize the sequence for 
X = a\,X = a2, and find the sign of all y-polynomials over 7^ as above. The 
Sturm-Habicht sequence was computed in MAPLE, using the elements of the Be- 
zoutian matrix. Additionally we used the packages codegeneration, optimize, 
in order to produce efficient C++ code. 

solve (UPoly /) Returns the roots of / as root_of. If / has degree < 4 we 
use the discrimination system, otherwise we use non-static Sturm sequences. 

solve (BPoly /i , BPoly /2) We consider the resultants Rx,Ry of /i,/2 by 
eliminating y and x respectively, thus obtaining degree -4 polynomials in x and y. 
The isolating points of R^, Ry define a grid of boxes, where the intersection points 
are located. The grid has 1 to 4 rows and 1 to 4 columns. It remains to decide, 
for boxes, whether they are empty and, if not, whether they contain a simple or 
multiple root. Multiple roots of the resultants are either rational or quadratic 
algebraic numbers by prop. 5 . Each box contains at most one intersection point. 
We can decide if a box is empty by 2 calls to the bivariate sign_at function. We 
can do a significant optimization by noticing that the intersections in a column 
(row) cannot exceed 2 nor the multiplicity of the algebraic number and thus 
excluding various boxes. 

Unlike [ 2 ], where the boxes cannot contain any critical points of the inter- 
secting conics, our algorithm does not make any such assumption, hence there 
is no need to refine them. Our approach can be extended in order to compute 
intersection points of bivariate polynomials of arbitrary degree, provided that 
we obtain isolating points for the roots of the two resultants, either statically 
(as above) or dynamically. 
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Table 2. Running times in msec for comparing specific roots of 2 quartics. In paren- 
theses are the number types: gmpq / gmp mpf stand for GMP rationals / floating point. 



msec 


Z 


Q 


far 


M-Q 


M 


Z 


Q 


far 


M-Q 


M 


AXIOM 


38.0 


57.4 


77.3 


67.2 


82.3 












MAPLE 9 


33.2 


52.4 


69.0 


75.5 


74.7 












CORE 


8.41 


5.67 


6.76 


9.31 


10.1 












SYNAPS(gmpq) 


1.097 


0.820 


0.596 


1.480 


2.114 


2.687 


2.693 


2.780 


3.764 


5.698 


[11] (gmpq) 


1.249 


0.921 


0.991 


1.582 


1.544 


23.74 


64.4 


7.3 


4.121 


54.7 


[llj-FlLT(gmpq) 


0.346 


0.301 


0.279 


0.313 


0.320 


1.594 


2.130 


1.035 


2.758 


3.980 


S'" (gmpq) 


0.077 


0.083 


0.082 


0.074 


0.077 


0.117 


0.190 


0.115 


0.161 


0.129 


SYNAPS(gmp mpf) 


0.302 


0.202 


0.195 


0.339 


0.385 












(gmp mpf) 


0.060 


0.064 


0.057 


0.063 


0.061 












SYNAPS(double) 


0.055 


0.042 


0.043 


0.061 


0.071 












S'^ (double) 


0.010 


0.011 


0.012 


0.010 


0.010 













8 Experimental Results 

We performed tests on a 2.6GHz Pentium with 512MB memory. The results 
are on table 2. axiom refers to the implementation of real algebraic arithmetic 
by Rioboo (current CVS version); it is meant, like maple 9, only for a rough 
comparison. The package in [11] uses subdivision based on Sturm sequences. Row 
[11]-FILT is the same but the program tries to decide with double arithmetic 
and, if it cannot, then switches to exact arithmetic. is our code implemented 
in SYNAPS 2.1. 

Every test is averaged over 10000 polynomial pairs. The left part of the table 
includes results for polynomials with 4 random real roots between -20 and 20, 
multiplied by an integer € [1, 20], so the coefficients are integers of absolute value 
< 32 • 10® and the Mignotte polynomials are a{x^ — 2{Lx — 1)^), where a, L are 
chosen randomly from [3, 30]. The right part of the table tests polynomials with 
random roots € [10 • 10^, 11 • 10^] and Mignotte polynomials with parameters 
a, L both chosen randomly from [10 • 10^, 11 • 10^] with uniform distribution. 

Column Z indicates polynomials with 4 integer roots. Column Q indicates 
polynomials with 4 rational roots. Column far indicates that one polynomial has 
only negative roots while the other one has only positive ones. M — Q indicates 
that one is a Mignotte polynomial and the other has 4 rational roots. M indicates 
that we compare the roots of two Mignotte polynomials. 

CORE cannot handle multiple roots. This is also the case for the iterative 
solver of SYNAPS, which also has problems when the roots are endpoints of a 
subdivision, maple cannot always handle equality for the roots of two Mignotte 
polynomials. Of course all the implementations have problems when the number 
type is not an exact one. This is the case of double and gmp mpf. By considering 
the left part of table 2, our code is 103, 15 and 16 times faster than CORE, SYNAPS 
and [11], respectively, in the average case and when no filtering is used. Even 
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in case of filtering ([11]-filt) our code is faster by a factor of 4 on average. 
Increasing the magnitude of the coefficients or decreasing the separation bound 
of the roots, leads to even more dramatic improvement in the efficiency of our 
implementation over the other ones; this is the case for the right part of table 2. 

has similar running times for all kinds of tests and handles degenerate 
cases faster that the general ones since we use the discrimination system. This is 
not the case for any of the other approaches. The running times of our code with 
an exact number type is less than 8 times slower than when used with doubles. 
However the latter does not offer any guarantee for the computed result. This is 
an indication that exact arithmetic imposes a reasonable overhead on efficiency 
when used in conjunction with efficient algorithms and carefully implemented. 

9 Future Work 

Consider the quintic ax^ + 10 cx^ — 10 dx"^ + 5ex — f, with a > 0 and c < 0 by 
assuming 5 real roots. Using the techniques above, we obtain 2 pairs of isolating 
polynomials. Two of them are 

= 2X^ca + 3dXa + 8c^ B 2 = (~4df + 4e^) X^ + feX - f (11) 

One pair of roots may be separated by the roots of Bi,B 2 - We combine them 
by finding an integer between the roots of the respective discriminants: Abi = 
a{9d'^a — 64c^), Ab 2 = (17e^ — 16df)p. It suffices to set K to 2\Ab^'\ or 2\Ab2\ , 
depending on the signs of e, /. Tests with maple support our choices. 

Filtering should lead to running times similar to those of the double number 
type. Additionally we are planning to compare our software against the software 
of [18] and NiX, the polynomial library of EXACUS. 

The algebraic degree of the resultant is a tight lower bound in checking 
solvability, but is it tight for comparisons? 
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Abstract. Arrangements of planar curves are fundamental structures 
in computational geometry. We describe the recent developments in the 
arrangement package of Cgal, the Computational Geometry Algorithms 
Library, making it easier to use, to extend and to adapt to a variety of 
applications. This improved flexibility of the code does not come at the 
expense of efficiency as we mainly use generic-programming techniques, 
which make dexterous use of the compilation process. To the contrary, 
we expedited key operations as we demonstrate by experiments. 



1 Introduction 

Given a set C of planar curves, the arrangement A{C) is the subdivision of the 
plane induced by the curves in C into maximally connected cells. The cells can 
be 0-dimensional (vertices) , 1-dimensional (edges) or 2-dimensional (faces). The 
planar map of A(C) is the embedding of the arrangement as a planar graph, 
such that each arrangement vertex corresponds to a planar point, and each 
edge corresponds to a planar subcurve of one of the curves in C. Arrangements 
and planar maps are ubiquitous in computational geometry, and have numerous 
applications (see, e.g., [8,15] for some examples), so many potential users in 
academia and in the industry may benefit from a generic implementation of a 
software package that constructs and maintains planar arrangements. 

Cgal [1], the Computational Geometry Algorithms Library, is a software 
library, which is the product of a collaborative effort of several sites in Europe 
and Israel, aiming to provide a generic and robust, yet efficient, implementation 
of widely used geometric data structures and algorithms. The library consists of 
a geometric kernel [11,18], that in turn consists of constant-size non-modifiable 
geometric primitive objects (such as points, line segments, triangles etc.) and 
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RTD (FET Open) Projects under Contract No IST-2000-26473 (EGG — Effective 
Computational Geometry for Curves and Surfaces) and No IST-2001-39250 (MOVIE 
— Motion Planning in Virtual Environments), by The Israel Science Foundation 
founded by the Israel Academy of Sciences and Humanities (Center for Geometric 
Computing and its Applications), and by the Hermann Minkowski-Minerva Center 
for Geometry at Tel Aviv University. 



S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 664—676, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 



Code Flexibility and Program Efficiency by Genericity 665 



predicates and operations on these objects. On top of the kernel layer, the li- 
brary consists of a collection of modules, which provide implementations of many 
fundamental geometric data structures and algorithms. The arrangement pack- 
age is a part of this layer. 

In the classic computational geometry literature two assumptions are usually 
made to simplify the design and analysis of geometric algorithms: First, inputs 
are in “general position”. That is, there are no degenerate cases (e.g., three 
curves intersecting at a common point) in the input. Secondly, operations on 
real numbers yield accurate results (the “real Ram” model, which also assumes 
that each basic operation takes constant time). Unfortunately, these assumptions 
do not hold in practice. Thus, an algorithm implemented from a textbook may 
yield incorrect results, get into an infinite loop or just crash while running on a 
degenerate, or nearly degenerate, input (see [25] for examples). 

The need for robust software implementation of computational-geometry al- 
gorithms has driven many researches to develop variants of the classic algorithms 
that are less susceptible to degenerate inputs over the last decade. At the same 
time, advances in computer algebra enabled the development of efficient soft- 
ware libraries that offer exact arithmetic manipulations on unbounded integers, 
rational numbers (Gmp — Gnu’s multi-precision library [5]) and even algebraic 
numbers (the Gore [2] library and the numerical facilities of Leda [6]). These 
exact number types serve as fundamental building-blocks in the robust imple- 
mentation of many geometric algorithms. 

Keyser et al. [21] implemented an arrangement-construction module for alge- 
braic curves as part of the Mapc library. However, their implementation makes 
some general position assumptions. The Leda library [6] includes geometric 
facilities that allow the construction and maintenance of planar maps of line 
segments. LEDA-based implementations of arrangements of conic curves and of 
cubic curves were developed under the EXACUS project [3]. 

Goal’s arrangement package was the first generic software implementation, 
designed for constructing arrangements of arbitrary planar curves and support- 
ing operations and queries on such arrangements. More details on the design 
and implementation of this package can be found in [12,17]. In this paper we 
summarize the recent efforts that have been put into the arrangement package 
and show the improvements achieved: A software design relying on the generic- 
programming paradigm that is more modular and easy to use, and an imple- 
mentation which is more extensible, adaptable, and efficient. 

The rest of this paper is organized as follows: Section 2 provides the required 
background on Ggal’s arrangement package introducing its architecture key- 
points, with a special attention to the traits concept. In Section 3 we demon- 
strate the use of generic programming techniques to improve modularity and 
flexibility or gain functionality that did not exist in the first place. In Section 4 
we describe the principle of a meta-traits class and show how it considerably 
simplifies the curve hierarchy of the arrangement (as introduced in [17]). We 
present some experimental results in Section 5. Finally, concluding remarks and 
future-research suggestions are given in Section 6. 
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2 Preliminaries 

2.1 The Arrangement Module Architecture 

The Planar _map _2<Dcel ,Traits>^ class-template represents the planar embed- 
ding of a set of a:-monotone planar curves that are pairwise disjoint in their inte- 
riors. It is derived from the Topologicalanap class, which provides the necessary 
combinatorial capabilities for maintaining the planar graph, while associating ge- 
ometric data with the vertices, edges and faces of the graph. The planar map is 
represented using a doubly- connected edge list (Dcel for short), a data structure 
that enables efficient maintenance of two-dimensional subdivisions (see [8,20]). 

The PlELnar_map_2 class-template should be instantiated with two parame- 
ters. A Dcel class, which represents the underlying topological data structure, 
and a traits class, which provides the geometric functionality, and is tailored to 
handle a specific family of curves. It encapsulates implementation details, such 
as the number type used, the coordinate representation and the geometric or 
algebraic computation methods. The two template parameters enable the sepa- 
ration between the topological and geometric aspects of the planar subdivision. 
This separation is advantageous as it allows users to employ the package with 
their own special type of curves, without having any expertise in computational 
geometry. They should only be capable of supplying the traits methods, which 
mainly involve algebraic computations. Indeed, several of the package users are 
not familiar with computational-geometry techniques and algorithms. 

The Plauiar_map_with_intersections_2 class-template, should be instanti- 
ated with a Planar _map_2 class. It inherits from the planar-map class and ex- 
tends its functionality by enabling insertion of intersecting and not necessarily 
x-monotone curves. The main idea is to break each input curve into several 
a;-monotone subcurves, then treat each subcurve separately. Each subcurve is 
in turn split at its intersection points with other curves. The result is a set of 
a;-monotone and pairwise disjoint subcurves that induce a planar subdivision, 
equivalent to the arrangement of the original input curves [12]. An arrangement 
of a set of curves can be constructed incrementally, inserting the curves one 
by one, or aggregately, using a sweep-line algorithm (see, e.g., [8]). Once the 
arrangement is constructed, point-location queries on it can be answered, using 
different point-location strategies (see [12] for more details). Additional interface 
functions that modify, traverse, and display the map are available as well. 

The Arrangement _2 class-template allows the construction of a planar map 
with intersections, while maintaining a hierarchy of curve history. At the top level 
of the hierarchy stand the input curves. Each curve is subdivided into several 
x-monotone subcurves, forming the second level of the hierarchy. As indicated 
above, these x-monotone subcurves are further split such that there is no pair 
of subcurves that intersect in their interior. These subcurves comprise the low 
level of the curve hierarchy. See [17] for more details regarding the design of 
the Arrangement_2 template. The curve hierarchy is essential in a variety of 
applications that use arrangements. Robot motion planning is one example. 

^ Cgal prescribes the sufHx _2 for all data structures of planar objects as a convention. 
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2.2 The Traits Class 

As mentioned in the previous subsection, the Planar _map_2 class template is pa- 
rameterized with a traits class that defines the abstract interface between planar 
maps and the geometric primitives they use. The name “traits” was given by 
Myers [23] for a concept of a class that should support certain predefined meth- 
ods, passed as a parameter to another class template. In our case, the geometric 
traits-class defines the family of curves handled. Moreover, details such as the 
number type used to represent coordinate values, the type of coordinate system 
used (Cartesian, homogeneous, polar), the algebraic methods used and whether 
extraneous data is stored with the geometric objects, are all determined by the 
traits. A class that follows the geometric traits-class concept defines two types 
of objects, namely X_monotone_curve_2 and Point_2. The former represents an 
x-monotone curve, and the latter is the type of the endpoints of the curves, 
representing a point in the plane. In addition, the concept lists a minimal set 
of predicates on objects of these two types, sufficient to enable the operations 
provided by the Plcuiar_map_2 class: 

1. Compare two points by their ^-coordinates only or lexicographically (by 
their x and then by their ^-coordinates) . 

2. Given an x-monotone curve C and a point p = {xq, yo) such that xq is in the 
x-range of C (namely xq lies between the x-coordinates of C’s endpoints), 
determine if p is above, below or lies on C . 

3. Compare the y- values of two x-monotone curves Ci, C 2 at a given x- value 
in the x-range of both curves. 

4. Given two x-monotone curves C\, C 2 and one of their intersection points p, 
determine the relative positions of the two curves immediately to the right 
of p, or immediately to the left of 

In order to support the construction of an arrangement of curves (more pre- 
cisely, a Planar_map_with_intersections_2) one should work with a refined 
traits class. In addition to the requirements of the planar map traits concept, 
it defines a third type that represents a general (not necessarily x-monotone) 
curve in the plane, named Curve_2. An intersection point of the curves is of type 
Point_2. It also lists a few more predicates and geometric constructions on the 
three types as follows: 

1. Given a curve C, subdivide it into simple x-monotone subcurves Ci, . . . Ck- 

2. Given two x-monotone curves C\, C 2 and a point p, find the next intersection 
point of the two curves that is lexicographically larger, or lexicographically 
smaller, than p. In degenerate situations, determine the overlap between the 
two curves. 

We include several traits classes with the public distribution of Ggal: A 
traits class for line segments; a traits class that operates on polylines, namely 

^ Notice that when we deal with curved objects the intersection point may also be a 
tangency point, so the relative position of the curves to the right of p may be the 
same as it was to its left. 
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continuous piecewise linear curves [16]; and a traits class that handles conic arcs 
— segments of planar algebraic curves of degree 2 such as ellipses, hyperbolas 
or parabolas [26]. There are other traits classes that were developed in other 
sites [13] and are not part of the public distribution. Many users (see, e.g., [7, 
10,14,19,24]) have employed the arrangement package to develop a variety of 
applications. 



3 Genericity — The Name of the Game 

In this section we describe how we exploited several generic-programming tech- 
niques to make our arrangement package more modular, extensible, adaptable, 
and efficient. We have tightened the requirements from the traits concept, and 
allowed for an alternative subset of requirements to be fulfilled using a tag- 
dispatching mechanism, enabling easier development of external traits classes. 
At the same time, we have also improved the performance of the built-in traits 
classes and extended their usability through deeper template nesting. 



3.1 Flexibility by Genericity 

When constructing an arrangement using the sweep- line algorithm, we sweep 
the input set of planar curves from left to right, so it is sufficient to find just the 
next intersection point of a pair of curves to the right of a given reference point, 
and to compare two curves to the right (and not to the left) of their intersection 
point. ^ It is therefore sufficient for our arrangement traits-class to provide just 
a subset of the requirements listed in Section 2.2. However, even if one uses the 
incremental construction algorithm, one may wish to implement a reduced set 
of traits class methods in order to avoid code duplication. 

We use a tag- dispatching mechanism (see [4] for more details) to select the 
appropriate implementation that enables users to implement their traits class 
with an alternative or even reduced set of member functions. The traits class 
is in fact injected as a template parameter into a traits-class wrapper, which 
also inherits from it. The wrapper serves as a mediator between the planar-map 
general operations and the traits-class primitive operations. It uses the basic 
set of methods provided by the base traits-class to implement a wider set of 
methods, using tags to identify the missing or rather the existing basic methods. 

The tag Has_lef t_category, for example, indicates whether the requirements 
for the two methods below are satisfied: 

1. Given two x-monotone curves Ci, C 2 and one of their intersection points p, 
determine the relative positions of the two curves immediately to the left of p. 

® At first glance, it may seem that implementing the next intersection to the left 
computation is a negligible effort once we implement the next intersection to the 
right computation. However, for some sophisticated traits classes, such as the one 
described in [9], it is a major endeavor. 
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2. Given two a;-monotone curves Ci, C 2 and a point p, find the next intersection 
point of the two curves that is lexicographically smaller than p. 

The tag Has_ref lect_category indicates whether an alternative requirement 
is satisfied. That is, whether functions that reflect a point or a curve about the 
major axes are provided. When the has-left tag is false and the reflect tag is 
true, the next intersection point, or a comparison to the left of a reference point, 
is computed with the aid of the alternative methods by reflecting the relevant 
objects, performing the desired operation to the right, and then reflecting the re- 
sults back. This way the user is exempt from providing an implementation of the 
“left” methods. If none of these methods are provided, we resort to (somewhat 
less efficient) algorithms based just on the reduced set of provided methods: We 
locate a new reference point to the left of the original point, and check what 
happens to its right. 



3.2 Efficiency by Genericity 

In geometric computing there is a major difference between algorithms that eval- 
uate predicates only, and algorithms that in addition construct new geometric 
objects. A predicate typically computes the sign of an expression used by the 
program control, while a constructor produces a new geometric object such as 
the intersection point of two segments. If we use an exact number type to ensure 
robustness, the newly constructed objects often have a more complex represen- 
tation in comparison with the input objects (i.e. the bit-length needed for their 
representation is often larger). Unless the overall algorithm is carefully designed 
to deal with these new objects, constructions will have a severe impact on the 
algorithm performance. 

The Arr_segment_traits_2 class is templated with a geometric kernel object, 
that conforms to the Cgal kernel-concept [18], and supplies all the data types, 
predicates and constructions on linear objects. A natural candidate is Cgal’s 
Cartesian kernel, which represents each segment using its two endpoints, while 
each point is represented using two rational coordinates. This simple representa- 
tion is very natural, yet it may lead to a cascaded representation of intersection 
points with exponentially long bit-length (see Figure 1 for an illustration).^ 

To avoid this cascading problem, we introduced the Arr_cached_segment_traits_2 
class. This traits class is also templated by a geometric kernel, but uses its 
own internal representation of a segment: In addition to the two endpoints it 
also stores the coefficients of the underlying line. When a segment is split, the 
underlying line of the two resulting sub-segments remains the same and only 
their endpoints are updated. When we compute an intersection point of two 
segments, we use the coefficients of the corresponding underlying lines and thus 
overcome the undesired effect of cascading. 

^ A straightforward solution would be to normalize all computations. However, our 
experience shows that indiscriminate normalization considerably slows down the 
arrangement construction. 



670 E. Fogel, R. Wein, and D. Halperin 



The cached segment traits-class achieves faster running times than the ker- 
nel segment traits, when arrangements with relatively many intersection points 
are constructed. It also allows for working with less accurate, yet computation- 
ally efficient number types, such as Quotient<MP_Float>® (see Cgal’s manual 
at [1]). On the other hand, it uses more space and stores extra data with each 
segment, so constructing sparse arrangements could be more efficient with the 
kernel (non-cached) traits-class implementation. As our software is generic, users 
can easily switch between the two traits classes and check which one is more suit- 
able for their application by changing just a few lines of code. An experimental 
comparison of the two types of segment traits-classes is presented in Section 5. 

3.3 Succinctness by Genericity 

Polylines are of particular 
interest, as they can be 
used to approximate higher- 
degree algebraic curves, and 
at the same time they are 
easier to deal with in com- 
parison with conics for exam- 
ple.® 

Previous releases of 
Cgal included a stand-alone 
polyline traits class. This 
class represented a poly- 
line as a list of points and 
performed all geometric op- 
erations on this list (see [16] 
for more details). We have 
recently rewritten the poly- 
line traits-class, making 
it a class template named 
Arr_polyline_traits_2, 
that is parametrized with a 
Segment.traits, a traits 
class appropriate for handling segments. The polyline is implemented as a 
vector of Segment_traits : ; Curve_2 objects (namely of segments). The new 
polyline traits-class does not perform any geometric operation directly. Instead, 
it relies solely on the functionality of the segment traits. For example, when 
we wish to determine the position of a point with respect to an x-monotone 
polyline, we use binary search to locate the relevant segment that contains the 
point in its x-range, then we compare the point to this segment. Operations on 
polylines of size m therefore take O(logm) time. 

® HP -Float represents floating-point numbers with an unbounded mantissa, but with 
a bounded exponent. In some cases (see, e.g. Figure 1) the exponent may overflow. 
® With polylines it is sufficient to use an exact rational number type. 




Fig. 1. Cascaded computation of the coordinates of 
the intersection points in an arrangement of four seg- 
ments. The order of insertion of the segments is indi- 
cated in brackets. Notice the exponential growth of 
the bitlengths of the intersection-point coordinates. 
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Users are free to choose the underlying segment traits, giving them the abil- 
ity to use the kernel (non-cached) segment traits or the cached segment traits, 
depending on the number of expected intersection points. Moreover, it is possible 
to instantiate the polyline traits template with a data traits- class that handles 
segments with some additional data (see the next section). This makes it pos- 
sible to associate some data with the entire polyline and possibly different data 
with each of the segments of the set that comprises it. 

4 The Data Meta-traits 

Additional information can be maintained either by extending the vertex, half- 
edge, or face types provided by the topological map through inheritance, or 
alternatively by extending their geometric mappings — that is, the point and 
curve types. The former option should be used to retain additional information 
related to the planar-map topology, and is done by instantiating an appropriate 
Dcel, which can be conveniently derived from the one provided with the Cgal 
distribution. The latter option can be carried out by extending the geometric 
types of the kernel, as the kernel is fully adaptable and extensible [18], but this 
indiscriminating extension could lead to an undue space consumption. Extending 
the curve type of the planar-map only is easy with the meta traits which we 
describe next. 

We define a simple yet powerful meta-traits class template called data traits. 
The data traits-class is used to extend planar-map curve types with additional 
data. It should be instantiated with a regular traits-class, referred to as the base 
traits-class, and a class that contains all extraneous data associated with a curve. 

template <class Base_traits, class Data> 

class Arr_curve_data_traits_2 : public Base_traits { 

public : 

// (1) Type definitions. 

// (2) Overridden functions. 

}; 



The base traits-class must meet all the requirements detailed in Section 2.2 
including the definition of the types Point_2, Curve_2, and X_monotone_curve_2. 
The data traits-class redefines its Curve_2 and X_monotone_curve_2 types as de- 
rived classes from the respective types in the base traits-class, with an additional 
data field. 

// (1) Type definitions: 

typedef Base_traits : : Curve_2 Base_curve_2; 

typedef Base_traits : : X_monotone_curve_2 Base_x_mon_curve_2 ; 
typedef Base_traits : :Point_2 Point_2; 

class Curve_2 : public Base_curve_2 { 

Data m_data; // Additional data, 

public : 
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Curve_2 (const Base_curve_2& cv, const Data& dat) ; 
const Dataft get_data () const; 
void set_data (const Data& data) ; 



class X_monotone_curve_2 : public Base_x_mon_curve_2 {. 

Data m_data; // Additional data, 

public : 

X_monotone_curve_2 (const Base_x_mon_curve_2& cv, const Datafe dat); 
const Datafe get_data () const; 
void set_data (const Data& data) ; 



The base traits-class must support all necessary predicates and geometric 
constructions on curves of the specific family it handles. The data traits-class 
inherits from the base traits-class all the geometric predicates and some of the 
geometric constructions of point objects. It only has to override the two functions 
that deal with constructions of x-monotone curves: 

• It uses the Base_traits: : curve_make_xjmonotone () function to subdivide 
the basic curve into basic x-monotone subcurves. It then constructs the 
output subcurves by copying the data from the original curve to each of the 
output x-monotone subcurves. 

• Similarly, the Base_traits: : curve_split () function is used to split an x- 
monotone curve, then its data is copied to each of the two resulting sub- 
curves. 

// (2) Overridden functions: 

template <class Output_iterator> 

void curve_make_x_monotone (const Curve_2& cv, 

Output_iterator& x_cvs) const; 

void curve_split (const X_monotone_curve_2& cv, const Point_2& p, 

X_monotone_curve_2& cl, X_monotone_curve_2& c2) const; 



4.1 Constructing the Arrangement Hierarchy 

Using the data traits-class with the appropriate parameters, it is simple to im- 
plement the arrangement hierarchy (see Section 2.1) without any additional data 
structures. Given a set of input curves, we construct a planar map that represents 
their planar arrangement. Since we want to be able to identify the originating 
input curve of each half-edge in the map, each subcurve we create is extended 
with a pointer to the input curve it originated from, using the data-traits mech- 
anism as follows: Suppose that we have a base traits-class that supplies all the 
geometric methods needed to construct a planar arrangement of curves of some 
specific kind (e.g., segments or conic arcs). We define a data traits-class in the 
following form: 

Arr_curve_data_traits_2<Base_traits , Base_traits : : Curve_2 *> traits; 
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When constructing the arrangement, we keep all our base input-curves in 
a container. Each curve we insert to the arrangement is then formed of a base 
curve and a pointer to this base curve. Each time a subcurve is created, the 
pointer to the base input-curve is copied, and can be easily retrieved later. 

4.2 An Additional Example 

Suppose that we are given a few sets of data for some country: A geographical 
map of the country divided into regions, the national road and railroad network, 
and the water routes. Roads, railroads, and rivers are represented as polylines 
and have attributes (e.g., a name). We wish to obtain all crossroads and all 
bridges in some region of this country. 

Using the data traits we can give a straightforward solution to this problem. 
Suppose that the class Polyline_traits_2 supplies all the necessary geometric 
type definition and methods for polylines, fulfilling the requirements of the traits 
concept. We define the following classes: 
struct My_data { 

enum {ROAD , RAILROAD , RIVER}- type ; 
std:: string name; 

}; 

Arr_curve_data_traits_2<Polyline_traits_2, My_data> traits; 

Each curve consists of a base polyline-curve (e.g., a road, a river) and a name. 
We construct the arrangement of all curves in our datasets overlayed on top of 
the regional map. Then, we can simply go over all arrangement vertices located 
in the desired region, and examine the half-edges around each vertex. If we find 
only ROAD or RAILROAD half-edges around a vertex, we can conclude it represents 
a crossroad. A vertex where half-edges of types ROAD (or RAILROAD) and RIVER 
meet represents a bridge. In any case, we can easily retrieve the names of the 
intersecting roads or rivers and present them as part of the output. 

5 Experimental Results 

As mentioned above, there are many ways to represent line segments in the plane. 
In the first set of experiments we compared the performance of some represen- 
tations. When the Cgal Cartesian kernel is used as the template parameter 
of the traits class (Arr_segment_traits_2 for example), the user is still free 
to choose among different number types. We conducted our experiments with 
(i) Quotient<MP_FIoat>, (ii) Quotient<Gmpz>, representing the numerator and 
denominator as two unbounded integers, (iii) Gmpq, Gmp’s rational class, and 
(iv) Leda_rationaI, an efficient implementation of exact rationale. In addition, 
we used an external geometric kernel, called Leda rational kernel [22] , that uses 
floating-point filtering to speed up computations. 

We have tested each representation on ten input sets, containing 100-1000 
random input segments, having a quadratic number of intersection points. The 
results are summarized in Figure 2. The cached traits-class achieves better 
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(a) (b) 



Fig. 2. Construction times for arrangements of random line segments: (a) Using the 
kernel (non-cached) segment traits, (b) using the cached segment traits. 



performance for all tested configu- 
rations. Moreover, it is not possible 
to construct an arrangement using 
the Quotient<MP_Float> number type 
with the kernel traits (see Section 3.2). 
For lack of space, we do not show ex- 
periments with sparse arrangements. 

It is worth mentioning that switch- 
ing from one configuration to another 
requires a change of just a few lines of 
code. In fact, we used a benchmarking 
toolkit that automatically generates all 
the required configurations and mea- 
sures the performance of each configu- 
ration on a set of inputs. 

Figure 3 shows the output of the al- 
gorithm presented in Section 4.2. The 
input set, consisting of more than 900 
polylines, represents major roads, rail- 
roads, rivers and water-canals in the 
Netherlands. The arrangement con- 
struction takes just a few milliseconds. 




Fig. 3. Roads, railroads, rivers and wa- 
ter canals on the map of the Netherlands. 
Bridges are marked by circles. 



6 Conclusions and Future Work 

We show how our arrangement package can be used with various components 
and different underlying algorithms that can be plugged in using the appropriate 
traits class. Users may select the configuration that is most suitable for their 
application from the variety offered in Cgal or in its accompanying software 
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libraries, or implement their own traits class. Switching between different traits 
classes typically involves just a minor change of a few lines of code. 

We have shown how careful software design based on the generic- 
programming paradigm makes it easier to adapt existing traits classes or even to 
develop new ones. We believe that similar techniques can be employed in other 
software packages from other disciplines as well. 

In the future we plan to augment the polyline traits-class into a traits class 
that handles piecewise general curves that are not necessarily linear, and provide 
means to extend the planar-map point geometric type in similar ways the curve 
data-traits extends the curve type. 
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Abstract. The computation of dominators in a flowgrapli has applica- 
tions in program optimization, circuit testing, and other areas. Lengauer 
and Tarjan [17] proposed two versions of a fast algorithm for finding dom- 
inators and compared them experimentally with an iterative bit vector al- 
gorithm. They concluded that both versions of their algorithm were much 
faster than the bit-vector algorithm even on graphs of moderate size. 
Recently Cooper et al. [9] have proposed a new, simple, tree-based iter- 
ative algorithm. Their experiments suggested that it was faster than the 
simple version of the Lengauer-Tarjan algorithm on graphs representing 
computer program control flow. Motivated by the work of Cooper et ah, 
we present an experimental study comparing their algorithm (and some 
variants) with careful implementations of both versions of the Lengauer- 
Tarjan algorithm and with a new hybrid algorithm. Our results suggest 
that, although the performance of all the algorithms is similar, the most 
consistently fast are the simple Lengauer-Tarjan algorithm and the hy- 
brid algorithm, and their advantage increases as the graph gets bigger 
or more complicated. 



1 Introduction 

A flowgraph G = {V, A, r) is a directed graph with \ V\ = n vertices and |A| = m 
arcs such that every vertex is reachable from a distinguished root vertex r € V. 
A vertex w dominates a vertex v if every path from r to u includes w. Our goal 
is to find for each vertex v in V the set Dom{v) of all vertices that dominate v. 
Certain applications require computing the postdominators of G, defined as the 
dominators in the graph obtained from G by reversing all arc orientations. 

Compilers use dominance information extensively during program analysis 
and optimization, for such diverse goals as natural loop detection (which enables 
a host of optimizations), structural analysis [20], scheduling [22], and the compu- 
tation of dependence graphs and static single-assignment forms [10]. Dominators 
are also used to identify pairs of equivalent line faults in VLSI circuits [7]. 

The problem of finding dominators has been extensively studied. In 1972 
Allen and Cocke showed that the dominance relation can be computed iteratively 
from a set of data-flow equations [5]. A direct implementation of this solution 
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has 0{mn?) worst-case time. Purdom and Moore [18] gave a straightforward 
algorithm with complexity 0{mn). It consists of performing a search in G — 
for all V G V {v obviously dominates all the vertices that become unreachable). 
Improving on previous work by Tarjan [23], Lengauer and Tarjan [17] proposed 
an 0(ma(m, n))-time algorithm, where a{m,n) is an extremely slow-growing 
functional inverse of the Ackermann function. Alstrup et al. [6] gave a linear- 
time solution for the random-access model; a simpler solution was given by 
Buchsbaum et al. [8]. Georgiadis and Tarjan [12] achieved the first linear-time 
algorithm for the pointer-machine model. 

Experimental results for the dominators problem appear in [17,8,9]. In [17] 
Lengauer and Tarjan found the almost-linear-time version of their algorithm 
(LT) to be faster than the simple 0(m log n) version even for small graphs. 
They also show that Purdom-Moore [18] is only competitive for graphs with 
fewer than 20 vertices, and that a bit-vector implementation of the iterative 
algorithm, by Aho and Ullman [4], is 2.5 times slower than LT for graphs with 
more than 100 vertices. Buchsbaum et al. [8] show that their claimed linear- 
time algorithm has low constants, being only about 10% to 20% slower than 
LT for graphs with more than 300 vertices. This algorithm was later shown to 
have the same time complexity as LT [12], and the corrected version is more 
complicated (see Corrigendum of [8]). Cooper et al. [9] present a space-efficient 
implementation of the iterative algorithm, which they claimed to be 2.5 times 
faster than the simple version of LT. However, a more careful implementation of 
LT later led to different results (personal communication). 

In this paper we cast the iterative algorithm into a more general framework 
and explore the effects of different initializations and processing orderings. We 
also discuss implementation issues that make both versions of LT faster in prac- 
tice and competitive with simpler algorithms even for small graphs. Furthermore, 
we describe a new algorithm that combines LT with the iterative algorithm and 
is very fast in practice. Finally, we present a thorough experimental analysis 
of various algorithms using real as well as artificial data. We did not include 
linear-time algorithms in our study; they are significantly more complex and 
thus unlikely to be faster than LT in practice. 

2 Algorithms 

The immediate dominator of a vertex v, denoted by idom(v), is the unique vertex 
w ^ V that dominates v and is dominated by all vertices in Dom(v) — v. The 
(immediate) dominator tree is a directed tree I rooted at r and formed by the 
edges {(idom{v),v) \ v gV — r}. A vertex w dominates v if and only if w is a 
proper ancestor of u in / [3], so it suffices to compute the immediate dominators. 
Throughout this paper the notation “u A f u” means that v is an ancestor of u 
in the forest F and “v -^f u” means that u is a proper ancestor of u in F. We 
omit the subscript when the context is clear. Also, we denote by pred(v) the set 
of predecessors in G of vertex v. Finally, for any subset U C V and a tree T, 
NCA(T, U) denotes the nearest common ancestor of C/ fl T in T. 
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2.1 The Iterative Algorithm 

Clearly Dom{r) = {r}. For each of the remaining vertices, the set of dominators 
is the solution to the following data-flow equations: 

Doin' {v) = ^ Pi Dom'{u^ U {u}, V v € V — r. (1) 

uGpred{v) 

Allen and Cocke [5] showed that one can iteratively find the maximal fixed- 
point solution Dom'{v) = Dom(v) for all v. Typically the algorithm either ini- 
tializes Doin' {v) -i— V for all u yf r, or excludes uninitialized Dom'{u) sets from 
the intersection in (1). Cooper et al. [9] observe that the iterative algorithm 
does not need to keep each Dom! set explicitly; it suffices to maintain the tran- 
sitive reduction of the dominance relation, which is a tree T. The intersection 
of Dom' {u) and Dom' {w) is the path from NCA(T, {u,w}) to r. Any spanning 
(sub)tree S oi G rooted at r is a valid initialization for T, since for any v € S 
only vertices in r Ag u can dominate v. 

It is known [19] that the dominator tree / is such that if w is a vertex in 
V — r then idom{w) = NCA(/,pred('u;)). Thus the iterative algorithm can be 
interpreted as a process that modifies a tree T successively until this property 
holds. The number of iterations depends on the order in which the vertices (or 
edges) are processed. Kam and Ullman [15] show that certain dataflow equations, 
including (1), are solved in up to d(G, I?) -|- 3 iterations when the vertices are 
processed in reverse postorder with respect to a DFS tree D. Here d(G, D) is 
the loop connectedness of G with respect to D, the largest number of back edges 
found in any cycle-free path of G. When G is reducible [13] the dominator tree 
is built in one iteration, because v dominates u whenever (u,v) is a back edge. 

The running time per iteration is dominated by the time spent on NCA 
calculations. If they are performed naively (ascending the tree paths until they 
meet), then a single iteration costs 0{mn) time. Because there may be up to 
0{n) iterations, the running time is 0{mn^). The iterative algorithm runs much 
faster in practice, however. Typically d{G,D) < 3 [16], and it is reasonable to 
expect that few NCA calculations will require 0(n) time. If T is represented as 
a dynamic tree [21], the worst-case bound per iteration is reduced to O(mlogn), 
but the implementation becomes much more complicated. 



Initializations and vertex orderings. Our base implementation of the itera- 
tive algorithm (IDFS) starts with T ^ {r} and processes the vertices in reverse 
postorder with respect to a DFS tree, as done in [9] . This requires a preprocessing 
phase that executes a DFS on the graph and assigns a postorder number to each 
vertex. Initializing T as a DFS tree is bad both in theory and in practice because 
it causes the back edges to be processed, even though they contribute nothing to 
the NCAs. Intuitively, a much better initial approximation of the dominator tree 
is a BFS tree. We implemented a variant of the iterative algorithm (which we 
call IBFS) that starts with such a tree and processes the vertices in BFS order. 
As Section 4 shows, this method is often (but not always) faster than IDFS. 
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Finally, we note that there is an ordering a of the edges which is optimal 
with respect to the number of iterations that are needed for convergence. If we 
initialize T = {r} and process the edges according to cr, then after one iteration 
we will have constructed the dominator tree. We are currently investigating if 
such an ordering can be found efficiently. 



2.2 The Lengauer-Tarjan Algorithm 

The Lengauer-Tarjan algorithm starts with a depth- first search on G from r and 
assigns preorder numbers to the vertices. The resulting DFS tree D is represented 
by an array parent. For simplicity, we refer to the vertices of G by their preorder 
number, so v < u means that v has a lower preorder number than u. The algo- 
rithm is based on the concept of semidominators, which give an initial approxi- 
mation to the immediate dominators. A path P = {u = vq,Vi, . . . , Vk-i, Vk = v) 
in G is a semidominator path ii Vi > v for 1 < i < k — 1. The semidominator of 
V is defined as sdom(v) = min{u | there is a semidominator path from u to v}. 

Semidominators and immediate dominators are computed by finding min- 
imum sdom values on paths of D. Vertices are processed in reverse preorder, 
which ensures that all the necessary values are available when needed. The algo- 
rithm maintains a forest F such that when it needs the minimum sdom(u) on a 
path p=w^u^v, w will be the root of a tree in F containing all vertices on 
p (in general, the root of the tree containing v is denoted by rootp(v)). Every 
vertex in V starts as a singleton in F. Two operations are defined on F: link(t!) 
links the tree rooted at v to the tree rooted at parent[v]; eval(t!) returns a vertex 
u of minimum sdom among those satisfying rootp{v) ^ u A u. 

Every vertex w is processed three times. First, sdom{w) is computed and w is 
inserted into a bucket associated with vertex sdom{w). The algorithm processes 
w again after sdom{v) has been computed, where v satisfies parent[v] = sdom{w) 
and V ^ w; then it finds either the immediate dominator or a relative dominator 
of w (an ancestor of w that has the same immediate dominator as w). Finally, 
immediate dominators are derived from relative dominators in a preorder pass. 

With a simple implementation of link-eval (using path compression but not 
balancing), the LT algorithm runs in 0(m log n) time [25]. With a more sophisti- 
cated linking strategy that ensures that F is balanced, LT runs in 0{ma(m,n)) 
time [24]. We refer to these two versions as SLT and LT, respectively. 



Implementation issues. Buckets have very specific properties in the Lengauer- 
Tarjan algorithm: (1) every vertex is inserted into at most one bucket; (2) there is 
exactly one bucket associated with each vertex; (3) vertex i can only be inserted 
into some bucket after bucket i itself is processed. Properties (1) and (2) ensure 
that buckets can be implemented with two n-sized arrays, first and next: first[i] 
represents the first element in bucket i, and next[v] is the element that succeeds 
V in the bucket it belongs to. Property (3) ensures that these two arrays can be 
combined into a single array bucket. 
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In [17], Lengauer and Tarjan process bucket[parent[w]] at the end of the 
iteration that deals with w. A better alternative is to process bucket[w] in the 
beginning of the iteration; each bucket is now processed exactly once, so it need 
not be emptied explicitly. Another measure that is relevant in practice is to avoid 
unnecessary bucket insertions: a vertex w for which parent[w] = sdom{w) is not 
inserted into any bucket because we already know that idom(w) = parent[w]. 

2.3 The SEMI-NCA Algorithm 

SEMI-NCA is a new hybrid algorithm for computing dominators that works 
in two phases: the first computes sdom(v) for all ?; £ V — r (as in LT), and 
the second builds I incrementally, using the fact that for any vertex w ^ r, 
idom{w) = IdiG A{I , {par ent[w], sdom{w)}) [12]. In the second phase, for every 
w in preorder we ascend the J-tree path from parent[w\ to r until we meet the 
first vertex x such that x < sdom{w). Then we have x = idom{w). With this 
implementation, the second phase runs in 0{v?) worst-case time. However, we 
expect it to be much faster in practice, since our empirical results indicate that 
sdom{v) is usually a good approximation to idom{v). 

SEMI-NCA is simpler than LT in two ways. First, eval can return the min- 
imum value itself rather than a vertex that achieves that value. This elimi- 
nates one array and one level of indirect addressing. Second, buckets are no 
longer necessary because the vertices are processed in preorder in the second 
phase. With the simple implementation of link-eval (which is faster in practice), 
this method (SNCA) runs in 0{n^ -I- to log n) worst-case time. We note that 
Gabow [11] presents a rather complex procedure that computes NCAs in total 
linear time on a tree that grows by adding leaves. This implies that the second 
phase of SEMI-NCA can run in 0{n) time, but it is unlikely to be practical. 

3 Worst-Case Behavior 

This section briefly describes families of graphs that elicit the worst-case behavior 
of the algorithms we implemented. Figure 1 shows graph families that favor 
particular methods against the others. Family itworst(k) is a graph with 0{k) 
vertices and 0{k‘^) edges for which IDFS and IBFS need to spend 0{k^) time. 
Family idfsquad(k) has 0{k) vertices and 0{k) edges and can be processed in 
linear time by IBFS, but IDFS needs quadratic time; ibfsquad(k) achieves the 
reverse effect. Finally sncaworst(k) has 0{k) vertices and 0{k) edges and causes 
SNCA, IDFS and IBFS to run in quadratic time. Adding any (yi,Xk) would make 
SNCA and IBFS run in 0{k) time, but IDFS would still need O(k^) time. We 
also define the family sltworst(k), which causes worst-case behavior for SLT [25]. 
Because it has 0{k) back edges, the iterative methods run in quadratic time. 

4 Empirical Analysis 

Based on worst-case bounds only, the sophisticated version of the Lengauer- 
Tarjan algorithm is the method of choice among those studied here. In practice. 
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itworst(k) 



idfsquad(k) 



ibfsquad(k) sncaworst(k) 




r 



r 




r 



yj y2 yk 



Fig. 1. Worst-case families. 



however, “sophisiticated” algorithms tend to be harder to code and to have 
higher constants, so one might prefer other alternatives. The experiments re- 
ported in this section shed some light on this issue. 

Implementation and Experimental Setup. We implemented all algorithms 
in C-|— 1-. They take as input the graph and its root, and return an n-element 
array representing immediate dominators. Vertices are assumed to be integers 1 
to n. Within reason, we made all implementations as efficient and uniform as we 
could. The source code is available from the authors upon request. 

The code was compiled using g++ v. 3.2.2 with full optimization (flag -04). 
All tests were conducted on a Pentium IV with 256 MB of RAM and 256 kB 
of cache running Mandrake Linux at 1.7 GHz. We report CPU times measured 
with the getrusage function. Since its precision is only 1/60 second, we ran each 
algorithm repeatedly for at least one second; individual times were obtained 
by dividing the total time by the number of runs. To minimize fluctuations 
due to external factors, we used the machine exclusively for tests, took each 
measurement three times, and picked the best. Running times do not include 
creating the graph, but they do include allocating and deallocating the arrays 
used by each particular algorithm. 

Instances. We used control-flow graphs produced by the SUIF compiler [14] 
from benchmarks in the SPEC’95 suite [2] and previously tested by Buchsbaum 
et al. [8] in the context of dominator analysis. We also used control-flow graphs 
created by the IMPACT compiler [1] from six programs in the SPEC 2000 suite. 
The instances were divided into series, each corresponding to a single bench- 
mark. Series were further grouped into three classes, SUIF-FP, SUIF-INT, and 
IMPACT. We also considered two variants of IMPACT: class IMPACTP con- 
tains the reverse graphs and is meant to test how effectively the algorithms 
compute postdominators; IMPACTS contains the same instances as IMPACT, 
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with parallel edges removed d We also ran the algorithms on circuits from VLSI- 
testing applications [7] obtained from the ISCAS’89 suite [26] (all 50 graphs were 
considered a single class). 

Finally, we tested eight instances that do not occur in any particular ap- 
plication related to dominators. Five are the worst-case instances described in 
Section 3, and the other three are large graphs representing speech recognition 
finite state automata (also used by Buchsbaum et al. [8]). 

Test Results. We start with the following experiment: read an entire series into 
memory and compute dominators for each graph in sequence, measuring the total 
running time. For each series. Table 1 shows the total number of graphs {g) and 
the average number of vertices and edges (n and m). As a reference, we report 
the average time (in microseconds) of a simple breadth-first search (BFS) on 
each graph. Times for computing dominators are given as multiples of BFS. 

In absolute terms, all algorithms are reasonably fast: none was more than 
seven times slower than BFS. Furthermore, despite their different worst-case 
complexities, all methods have remarkably similar behavior in practice. In no 
series was an algorithm twice as fast (or slow) as any other. Differences do exist, 
of course. LT is consistently slower than SLT, which can be explained by the 
complex nature of LT and the relatively small size of the instances tested. The 
iterative methods are faster than LT, but often slower than SLT. Both variants 
(IDFS and IBFS) usually have very similar behavior, although one method is 
occasionally much faster than the other (series 145.fppp and 256.bzip2 are good 
examples). Always within a factor of four of BFS, SNCA and SLT are the most 
consistently fast methods in the set. 

By measuring the total time per series, the results are naturally biased to- 
wards large graphs. For a more complete view, we also computed running times 
for individual instances, and normalized them with respect to BFS. For each 
class. Table 2 shows the geometric mean and the geometric standard deviation 
of the relative times. Now that each graph is given equal weight, the aggre- 
gate measures for iterative methods (IBFS and IDFS) are somewhat better than 
before, particularly for IMPACT instances. Deviations, however, are higher. To- 
gether, these facts suggest that iterative methods are faster than other methods 
for small instances, but slower when size increases. 

Figure 2 confirms this. Each point represents the mean relative running times 
for all graphs in the IMPACT class with the same value of |"log 2 (n + m)~\. It- 
erative methods clearly have a much stronger dependence on size than other 
algorithms. Almost as fast as a single BFS for very small instances, they be- 
come the slowest alternatives as size increases. The relative performance of the 
other methods is the same regardless of size: SNCA is slightly faster than SLT, 
and both are significantly faster than LT. A similar behavior was observed for 
IMPACTS and IMPACTP; for SUIF, which produces somewhat simpler graphs, 
iterative methods remained competitive even for larger sizes. 

^ These edges appear in optimizing compilers due to superblock formation, and are 
produced much more often by IMPACT than by SUIF. 
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Table 1. Complete series: number of graphs (g), average number of vertices (n) and 
edges (m), and average time per graph (in microseconds for BPS, and relative to BPS 
for other methods). The best result in each row is marked in bold. 



INSTANCE DIMENSIONS BPS RELATIVE TOTAL TIMES 



CLASS 


SERIES 


g 


n 


m 


TIME 


IDFS IBFS 


LT 


SLT 


SNCA 


CIRCUITS circuits 


50 S228.8 5027.2 


228.88 


5.41 


6.35 


4.98 


3.80 


3.48 


IMPACT 


ISl.mcf 


26 


26.5 


90.3 


1.41 


4.75 


4.36 


5.20 


3.33 


3.25 




197. parser 


324 


16.8 


55.7 


1.22 


4.22 


3.66 


4.39 


3.09 


2.99 




254. gap 


854 


25.3 


56.2 


1.88 


3.12 


2.88 


3.87 


2.71 


2.61 




255. vortex 


923 


15.1 


35.8 


1.27 


4.04 


3.84 


4.30 


3.24 


3.13 




256.bzip2 


74 


22.8 


70.3 


1.26 


4.81 


3.97 


4.88 


3.36 


3.20 




SOO.twolf 


191 


39.5 


115.6 


2.52 


4.58 


4.13 


5.01 


3.51 


3.36 


IMPACT? 


ISl.mcf 


26 


26.5 


90.3 


1.41 


4.65 


4.34 


5.09 


3.41 


3.21 




197. parser 


324 


16.8 


55.7 


1.23 


4.13 


3.40 


4.21 


3.01 


2.94 




254. gap 


854 


25.3 


56.2 


1.82 


3.32 


3.44 


3.79 


2.69 


2.68 




255. vortex 


923 


15.1 


35.8 


1.26 


4.24 


4.03 


4.19 


3.32 


3.32 




256.bzip2 


74 


22.8 


70.3 


1.28 


5.03 


3.73 


4.78 


3.23 


3.07 




SOO.twolf 


191 


39.5 


115.6 


2.52 


4.86 


4.52 


4.88 


3.38 


3.33 


IMPACTS 


ISl.mcf 


26 


26.5 


72.4 


1.30 


4.36 


4.04 


5.22 


3.30 


3.24 




197. parser 


324 


16.8 


42.1 


1.10 


4.10 


3.56 


4.67 


3.42 


3.32 




2 54. gap 


854 


25.3 


48.8 


1.75 


3.02 


2.82 


4.00 


2.80 


2.66 




255. vortex 


923 


15.1 


27.1 


1.16 


2.59 


2.41 


3.50 


2.45 


2.34 




256.bzip2 


74 


22.8 


53.9 


1.17 


4.25 


3.53 


4.91 


3.33 


3.24 




SOO.twolf 


191 


39.5 


96.5 


2.23 


4.50 


4.09 


5.12 


3.50 


3.41 


SUIP-PP 


lOl.tomcatv 


1 


143.0 


192.0 


4.23 


3.42 


3.90 


5.78 


3.67 


3.66 




102. swim 


7 


26.6 


34.4 


1.04 


2.77 


3.00 


4.48 


2.97 


2.82 




103.su2cor 


37 


32.3 


42.7 


1.29 


2.82 


2.99 


4.68 


3.01 


3.03 




104.hydro2d 


43 


35.3 


47.0 


1.39 


2.79 


3.05 


4.64 


2.94 


2.86 




107.mgrid 


13 


27.2 


35.4 


1.12 


2.58 


3.01 


4.25 


2.82 


2.77 




llO.applu 


17 


62.2 


82.8 


2.03 


3.28 


3.58 


5.36 


3.45 


3.41 




125.turbSd 


24 


54.0 


73.5 


1.51 


3.57 


3.59 


6.31 


3.66 


3.44 




145.fpppp 


37 


20.3 


26.4 


0.82 


3.00 


3.43 


4.83 


3.19 


3.19 




146.wave5 


110 


37.4 


50.7 


1.43 


3.09 


3.11 


5.00 


3.22 


3.15 


SUIP-INT 


009. go 


372 


36.6 


52.5 


1.72 


3.12 


3.01 


4.71 


3.00 


3.07 




124.ni88ksim 


256 


27.0 


38.7 


1.17 


3.35 


3.10 


4.98 


3.16 


3.18 




126. gcc 


2013 


48.3 


69.8 


2.35 


3.00 


3.01 


4.60 


2.91 


2.99 




129. compress 


24 


12.6 


16.7 


0.66 


2.79 


2.46 3.76 


2.60 


2.55 




ISO.li 


357 


9.8 


12.8 


0.54 


2.59 


2.44 3.92 


2.67 


2.68 




lS2.ijpeg 


524 


14.8 


20.1 


0.78 


2.84 


2.60 4.35 


2.84 


2.82 




lS4.perl 


215 


66.3 


98.2 


2.74 


3.77 


3.76 


5.43 


3.44 


3.50 




147. vortex 


923 


23.7 


34.9 


1.35 


2.69 


2.67 3.92 


2.59 


2.52 



Finding Dominators in Practice 



685 



Table 2. Times relative to BFS: geometric mean and geometric standard deviation. 
The best result in each row is marked in bold. 



IDFS IBFS LT SLT SNCA 



CLASS 


MEAN DEV 


MEAN DEV 


MEAN DEV 


MEAN DEV 


MEAN DEV 


CIRCUITS 


5.89 


1.19 


6.17 


1.42 


6.71 


1.18 


4.62 


1.15 


4.40 


1.14 


SUIF-FP 


2.49 


1.44 


2.34 


1.58 


3.74 


1.42 


2.54 


1.36 


2.96 


1.38 


SUIF-INT 


2.45 


1.50 


2.25 


1.62 


3.69 


1.40 


2.48 


1.33 


2.73 


1.45 


IMPACT 


2.60 


1.65 


2.24 


1.77 


4.02 


1.40 


2.74 


1.33 


2.56 


1.31 


IMPACTP 


2.58 


1.63 


2.25 


1.82 


3.84 


1.44 


2.61 


1.30 


2.52 


1.29 


IMPACTS 


2.42 


1.55 


2.05 


1.68 


3.62 


1.33 


2.50 


1.28 


2.61 


1.45 




Fig. 2. Times for IMPACT instances within each size. Each point represents the mean 
relative running time (w.r.t. BFS) for all instances with the same value of [log 2 (n+m)] . 



The results for IMPACT and IMPACTS indicate that the iterative methods 
benefit the most by the absence of parallel edges. Because of path compression, 
Lengauer-Tarjan and SEMI-NCA can handle repeated edges in constant time. 

Table 3 helps explain the relative performance of the methods with three 
pieces of information. The first is SDP%, the percentage of vertices (excluding 
the root) whose semidominators are their parents in the DPS tree. These vertices 
are not inserted into buckets, so large percentages are better for LT and SLT. 
On average, far more than half of the vertices have this property. In practice, 
avoiding unnecessary bucket insertions resulted in a 5% to 10% speedup. 

The next two columns show the average number of iterations performed by 
IDFS and IBFS. It is very close to 2 for IDFS: almost always the second iteration 
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Table 3. Percentage of vertices that have their parents as semidominators (sdp%), 
average nnmber of iterations and number of comparisons per vertex. 



CLASS 


SDP 

(%) 


ITERATIONS 

IDFS IBFS 


COMPARISONS PER VERTEX 

IDFS IBFS LT SLT SNCA 


CIRCUITS 


76.7 


2.8000 3.2000 


32.6 


39.3 12.0 9.9 


8.9 


IMPACT 


73.4 


2.0686 1.4385 


30.9 


28.0 15.6 12.8 


11.1 


IMPACTP 


88.6 


2.0819 1.5376 


30.2 


32.2 15.5 12.3 


10.9 


IMPACTS 


73.4 


2.0686 1.4385 


24.8 


23.4 13.9 11.2 


9.5 


SUIF-FP 


67.7 


2.0000 1.6817 


12.3 


15.9 10.3 8.3 


6.8 


SUIF-INT 


63.9 


2.0009 1.6659 


14.9 


17.2 11.2 8.6 


7.2 



just confirms that the candidate dominators found in the first are indeed correct. 
This is expected for control-flow graphs, which are usually reducible in practice. 
On most classes the average is smaller than 2 for IBFS, indicating that the 
BFS and dominator trees often coincide. Note that the number of iterations for 
IMPACTP is slightly higher than for IMPACT, since the reverse of a reducible 
graph may be irreducible. 

The last five columns show how many times on average a vertex is compared 
to other vertices (the results do not include the initial DPS or BFS) . The number 
of comparisons is always proportional to the total running time; what varies is the 
constant of proportionality, much smaller for simpler methods than for elaborate 
ones. Iterative methods need many more comparisons, so their competitiveness 
results mainly from smaller constants. For example, they need to maintain only 
three arrays, as opposed to six or more for the other methods. (Two of these 
arrays translate vertex numbers into DFS or BFS labels and vice-versa.) 



Table 4. Individual graphs (times for BFS in microseconds, all others relative to BFS). 
The best result in each row is marked in bold. 



INSTANCE 

NAME VERTICES 


EDGES 


BFS 

TIME 


RELATIVE RUNNING TIMES 

IDFS IBFS LT SLT SNCA 


idfsquad 


1501 


2500 


28 


2735.3 


21.0 


8.6 4.2 


10.5 


ibfsquad 


5004 


10003 


88 


4.9 9519.4 


8.8 4.5 


4.3 


itworst 


401 


10501 


34 


6410.5 6236.8 


9.2 4.7 


4.7 


sitworst 


32768 


65534 


2841 


283.4 


288.6 


7.9 11.0 


10.5 


sncaworst 


10000 


14999 


179 


523.2 


243.8 12.1 8.3 


360.7 


atis 


4950 515080 


2607 


8.3 


12.8 


6.5 3.5 


3.3 


nab 


406555 939984 


49048 


17.6 


15.6 12.8 11.6 


10.2 


pw 


330762 823330 


42917 


18.3 


15.1 13.3 12.1 


10.4 



We end our experimental analysis with results on artificial graphs. For each 
graph. Table 4 shows the number of vertices and edges, the time for BFS (in 
microseconds), and times for computing dominators (as multiples of BFS). The 
first five entries represent the worst-case familes described in Section 3. The last 
three graphs have no special adversarial structure, but are significantly larger 
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than other graphs. As previously observed, the performance of iterative methods 
tends to degrade more noticeably with size. SNCA and SLT remain the fastest 
methods, but now LT comes relatively close. Given enough vertices, the asymp- 
totically better behavior of LT starts to show. 

5 Final Remarks 

We compared five algorithms for computing dominators. Results on three classes 
of application graphs (program flow, VLSI circuits, and speech recognition) in- 
dicate that they have similar overall performance in practice. The tree-based 
iterative algorithms are the easiest to code and use less memory than the other 
methods, which makes them perform particularly well on small, simple graphs. 
Even on such instances, however, we did not observe the clear superiority of the 
original tree-based algorithm reported by Cooper et al. (our variants were not 
consistently better either). Both versions of LT and the hybrid algorithm are 
more robust on application graphs, and the advantage increases with graph size 
or graph complexity. Among these three, the sophisticated version of LT was the 
slowest, in contrast with the results reported by Lengauer and Tarjan [17]. The 
simple version of LT and hybrid were the most consistently fast algorithms in 
practice; since the former is less sensitive to pathological instances, it should be 
the method of choice. 
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Abstract. Our work is motivated by the problem of managing data on storage 
devices, typically a set of disks. Such storage servers are used as web servers or 
multimedia servers, for handling high demand for data. As the system is running, it 
needs to dynamically respond to changes in demand for different data items. There 
are known algorithms for mapping demand to a layout. When the demand changes, 
a new layout is computed. In this work we study the data migration problem, 
which arises when we need to quickly change one layout to another. This problem 
has been studied earlier when for each disk the new layout has been prescribed. 
However, to apply these algorithms effectively, we identify another problem that 
we refer to as the correspondence problem, whose solution has a significant impact 
on the solution for the data migration problem. We study algorithms for the data 
migration problem in more detail and identify variations of the basic algorithm 
that seem to improve performance in practice, even though some of the variations 
have poor worst case behavior. 



1 Introduction 

To handle high demand, especially for multimedia data, a common approach is to repli- 
cate data objects within the storage system. Typically, a large storage server consists of 
several disks connected using a dedicated network, called a Storage Area Network. Disks 
typically have constraints on storage as well as the number of clients that can access data 
from a single disk simultaneously. The goal is to have the system automatically respond 
to changes in demand patterns and to recompute data layouts. Such systems and their 
applications are described and studied in, e.g., [3,8] and the references therein. 

Approximation algorithms have been developed [7,4] to map known demand for data 
to a specific data layout pattern to maximize utilization. (Utilization refers to the total 
number of clients that can be assigned to a disk that contains the data they want.) In the 
layout, we compute not only how many copies of each item we need, but also a layout 
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pattern that specifies the precise subset of items on each disk. (This is not completely 
accurate and we will elaborate on this step later.) The problem is NP -hard, but there 
are polynomial time approximation schemes [4] . Given the relative demand for data, an 
almost optimal layout can be computed. 

Over time as the demand pattern changes, the system needs to create new data 
layouts. The problem we are interested in is the problem of computing a data migration 
plan for the set of disks to convert an initial layout to a target layout. We assume that data 
objects have the same size (these could be data blocks or files) and thaf it takes the same 
amount of time to migrate a data item between any pair of disks. The crucial constraint 
is that each disk can participate in the transfer of only one item - either as a sender or 
as a receiver. Our goal is to find a migration schedule to minimize the time taken (i.e., 
number of rounds) to complete the migration (makespan) since the system is running 
inefficiently until the new layout has been obtained. 

To handle high demand for newly popular objects, new copies will have to be dynam- 
ically created and stored on different disks. This means that we crucially need the ability 
to have a “copy” operation in addition to “move” operations. In fact, one of the crucial 
lower bounds used in the work on data migration [5] is based on a degree property of the 
multigraph. For example, if the degree of a node is 5, then this is a lower bound on the 
number of rounds that are required, since in each round at most one transfer operation 
involving this node may be done. For copying operations, clearly this lower bound is 
not valid. For example, suppose we have a single copy of a data item on a disk. Suppose 
we wish to create S copies of this data item on <5 distinct disks. Using the transfer graph 
approach, we could specify a “copy” operation from the source disk to each of the S 
disks. Notice that this would take at least <5 rounds. However, by using newly created 
copies as additional sources we can create S copies in [log((5 + 1)] rounds, as in the 
classic problem of broadcasting by using newly created copies as sources for the data 
object. (Each copy spawns a new copy in each round.) 

The most general problem of interest is the data migration problem with cloning [6] 
where data item i resides in a specified (source) subset Si of disks and needs to be moved 
to a (destination) subset Di. In other words, each data item that initially belongs to a 
subset of disks, needs to be moved to another subset of disks. (We might need to create 
new copies of this data item and store them on an additional set of disks.) 

1.1 The Correspondence Problem 

Given a set of data objects placed on disks, we shall assume that what is important is 
the grouping of the data objects and not their exact location on each disk. For example, 
we can represent a disk by the set {A, B, C} indicating that data objects A, B, and C 
are stored on this disk. If we move the location of these objects on the same disk, it does 
not affect the set corresponding to the disk in any way. 

Data layout algorithms (such as the ones in [7,4]) take as input a demand distribution 
for a set of data objects and outputs a grouping , S 2 ' , ■ ■ ■ Sn' as a desired data layout 
pattern on disks V ,2' , . . . , N' . (The current layout is assumed to be Si,S 2 ■ ■ ■ Sn-) Note 
that we do not need the data corresponding to the set of items Sy to be on (original) 
disk 1. For example the algorithm simply requires that a new grouping be obtained 
where the items in set Sy be grouped together on a disk. For instance, if S 3 = Sy then 
by simply “renaming” disk 3 as disk 1' we have obtained a disk with the set of items 
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Fig.l. A bad correspondence can yield a poor solution (the left figure), while a good correspon- 
dence can yield significantly better solutions for data movement (the right figure). 



51', assuming that these two disks are inter-changeable. We need to compute a perfect 
matching between the initial and final layout sets. An edge is present between Si and 
Sj! if disk i has the same capabilities as disk f . The weight of this edge is obtained 
by the number of “new items” that need to be moved to Si to obtain Sj/. A minimum 
weight perfect matching in this graph gives the correspondence that minimizes the total 
number of changes, but not the number of rounds. Once we fix the correspondence, we 
need to invoke an algorithm to compute a migration schedule to minimize the number of 
rounds. Since this step involves solving an NP-hard problem, we will use a polynomial 
time approximation algorithm for computing the migration. However, we still need to 
pick a certain correspondence before we can invoke a data migration algorithm. 

There are two central questions in which we are interested; these will be answered 
in Section 5: 

- Which correspondence algorithm should we use? We will explore algorithms 
based on computing a matching of minimum total weight and matchings where we 
minimize the weight of the maximum weight edge. Moreover, the weight function 
will be based on estimates of how easy or difficult it is to obtain copies of certain data. 

- How good are our data migration algorithms once we fix a certain corre- 
spondence? Even though we have bounds on the worst case performance of the 
algorithm, we would like to find whether or not its performance is a lot better 
than the worst case bound. (We do not have any example showing that the bound 
is tight.) In fact, it is possible that other heuristics perform extremely well, even 
though they do not have good worst case bounds [6] . 

For example, in the left figure of Figure 1 we illustrate a situation where we have 5 
disks with the initial and final configurations as shown. By picking the correspondence as 
shown, we end up with a situation where all the data on the first disk needs to be changed. 
We have shown the possible edges that can be chosen in the transfer graph along with the 
labels indicating the data items that we could choose to transfer from the source disk to the 
destination disk. The final transfer graph shown is a possible output of a data migration 
algorithm. This will take 5 rounds since all the data is coming to a single disk; node 1 will 
have a high in-degree. Item V can be obtained from tertiary storage, for example (or an- 
other device). Clearly, this set of copy operations will be slow and will take many rounds. 
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On the other hand, if we use the correspondence as shown by the dashed edges in 
the right figure of Figure 1, we obtain a transfer graph where each disk needs only one 
new data item and such a transfer can be achieved in two rounds in parallel. (The set of 
transfers performed by the data migration algorithm are shown.) 

1.2 Contributions 

In recent work [6], it was shown that the data migration with cloning problem is NP-hard 
and has a polynomial time approximation algorithm (KKW) with a worst case guarantee 
of 9.5. Moreover, the work also explored a few simple data migration algorithms. Some 
of the algorithms cannot provide constant approximation guarantee, while for some 
of the algorithms no approximation guarantee is known. In this paper, we conduct an 
extensive study of these data migration algorithms’ performance under different changes 
in user access patterns. We found that although the worst-case performance of the KKW 
algorithm is 9.5, in the experiments the number of rounds required is less than 3.25 
times the lower bound. We identify the correspondence problem, and show that a good 
correspondence solution can improve the performance of the data migration algorithms 
by a factor of 4.4, relative to a bad solution. More detailed observations and results of 
the study are given in Section 5. 

2 Models and Definitions 

In the data migration problem, we have N disks and A data items. For each item i, there 
is a subset of disks Si and Di. Initially only the disks in St have item i, and all disks 
in Di want to receive i. Note that after a disk in Di receives item i, it can be a source 
of item i for other disks in Di which have not received the item yet. Our goal is to find 
the minimum number of rounds. Our algorithms make use of known results on edge 
coloring of multigraphs. Given a graph G with max degree Aq and multiplicity p, it is 
well-known that it can be colored with at most Ac + p colors. 



3 Correspondence Problem Algorithms 

To match disks in the initial layout with disks in the target layout, we tried the following 
methods, after creating a bipartite graph with two copies of disks: 

1 . {Simple min max matching) The weight of matching disk p in the initial layout with 
disk q in the target layout is the number of new items that disk q needs to get from 
other disks (because disks p does not have these items). Find a perfect matching that 
minimizes the maximum weight of the edges in the matching. Effectively it pairs 
disks in the initial layout to disks in the target layout, such that the number of items 
a disk needs to receive is minimized. 

2. (Simple min sum matching) Minimum weighted perfect matching using the weight 
function defined in 1 . This method minimizes the total number of transfer operations. 

3. (Complex min sum matching) Minimum weighted perfect matching with another 
weight function that takes the ease of obtaining an item into account. Note that the 
larger the ratio of |Z9i| to |5'i| the more copying is required. Suppose disk p in the 
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initial layout is matched with disk q in the target layout, and let S be the set of items 
that disk q needs which are not on disk p. The weight corresponding to matching 
these two disks is max(log 1). 

4. Direct correspondence. Disk i in the initial layout is always matched with disk i in 
the target layout. 

5. Random permutation. 

4 Data Migration Algorithms 

4.1 KKW Data Migration Algorithm [6] 

We now describe KKW data migration algorithm. Dehne Pj as \{i\j G Di}\, i.e., the 
number of different sets Di to which a disk j belongs. We then define /3 as maxj^i jv Pj . 
In other words, P is an upper bound on the number of items a disk may need. Moreover, 
we may assume that Di and DiD Si = 0. (We simply define the destination set Di 
as the set of disks that need item i and do not currently have it. Here we only give a high 
level description of the algorithm. The details can be found in a paper by Khuller, Kim, 
and Wan [6]. 

1. {Choose sources) For an item i decide a unique source Si € Si so that a = 
maxj=i ... Ar(|{i|j = Si}| + Pj) is minimized. In other words, a is the maximum 
number of items for which a disk may be a source {sp or a destination. 

2. {Large destination sets) Find a transfer graph for items such that \Di \>P- 

a) {Choose representatives) We first compute a disjoint collection of subsets Gi, 

i = 1 . . . A. Moreover, Gi C Di and \Gi\ = . 

b) {Send to representatives) We have each item i sent to the set Gi. 

c) {From representatives to destination sets) We now create a transfer graph as 

follows. Each disk is a node in the graph. We add directed edges from disks in 
Gi to {P — 1) disks in Di \ Gi such that the out-degree of each node in 

Gi is at most P — 1 and the in-degree of each node in Di\Gi is 1. We redefine 
Di as a set of \Di \ Gi| — (/3 — 1) disks which do not receive item i so 
that they can be taken care of in Step 3. The redefined set Di has size <P. 

3. {Small destination sets) Find a transfer graph for items such that \Di \<P- 

a) {Find new sources in small sets) For each item i, find a new source s' in Di. 

A disk j can be a source s' for several items as long as \^i\ ^2/3—1 

where Ij is a set of items of which j is a new source. 

b) Send each item i from Si to s'. 

c) Create a transfer graph. We add a directed edge from the new source of item i 
to all disks in Di \ {s'}. 

4. We now find an edge coloring of the transfer graph obtained by merging two transfer 
graphs in Steps 2(c) and 3(c). The number of colors used is an upper bound on the 
number of rounds required to ensure that each disk in Dj receives item j. 

Theorem 1. (Khuller, Kim, and Wan [6]) The KKW algorithm described above has a 
worst-case approximation ratio of 9.5. 
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4.2 KKW Data Migration Algorithm Variants 

The 9.5-approximation algorithm for data migration {KKW (Basic)] uses several compli- 
cated components to achieve the constant factor approximation guarantee. We consider 

simpler variants of these components. The variants may not give good theoretical hounds. 

(a) In Step 2(a) (Choose representatives) we hnd the minimum integer m such that there 

exist disjoint sets of size [ J. The value of m should be between^ = w 

and (3. 

(b) In Step 2(b) (Send to representatives) we use a simple doubling method to satisfy 
all requests in G^. Since all groups are disjoint, it takes max^ log \Gi\ rounds. 

(c) In Step 3 (Small destination sets) we do not hnd a new source s'. Instead Si sends 
item i to Di directly for small sets. We try to hnd a schedule which minimizes the 
maximum total degree of disks in the hnal transfer graph in Step 4. 

(d) In Step 3(a) (Find new sources in small sets) when we hnd a new source s', S'^s can 
be candidates as well as Di&. If s' € Si, then we can save some rounds to send item 
i from Si to s'. 

4.3 Heuristic-Based Data Migration Algorithms [6] 

1. [Edge Coloring] We can hnd a transfer graph, given the initial and target layouts. 
In the transfer graph, we have a set of nodes for disks and there is an edge between 
node j and k if for each item i, j g Si and k € Di. Then we hnd an edge coloring 
of the transfer graph to obtain a valid schedule, where the number of colors used is 
an upper bound on the total number of rounds. 

2. [ Unweighted Matching] Find an unweighted matching in the transfer graph, schedule 
them, then create a new transfer graph based on the new Si and Di, and repeat. 

3. [Weighted Matching] Repeatedly hnd and remove a weighted matching from the 
(repeatedly re-created) transfer graph, where the weight between disks v and w is 
maxi(l + log over all items i where v G Si,w G Di, or w G Si,v G Di. 

4. [Item by Item] We process each item i sequentially and satisfy the demand by 
doubling the number of copies of an item in each round. 

5 Experiments 

The framework of our experiments is as follows: 

1. (Create an initial layout) Run the sliding window algorithm [4], given the number 
of user requests for each data object. In Section 5 we describe the distributions we 
used in generating user requests. These distributions are completely specihed once 
we hx the ordering of data objects in order of decreasing demand. 

2. (Create a target layout) Shuffle the ranking of items. Generate the new demand for 
each item according to the probabilities corresponding to the new ranking of the 
item. To obtain a target layout, take one of the following approaches. 

a) Run the sliding window algorithm again with the new request demands. 
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b) Use other (than sliding window) methods to create a target layout. The motiva- 
tion for exploring these methods is (a) performance issues (as explained later in 
the paper) as well as (b) that other algorithms (other than sliding window) could 
perform well in creating layouts. The methods considered here are as follows. 

i. Rotation of items: Suppose we numbered the items in non-increasing order 

of the number of copies in the initial layout. We make a sorted list of items 
of size k = , and let the list heli,l 2 , ■ ■ ■ ,h- Item li in the target layout 

will occupy the position of item in the initial layout, while item Ik in 
the target layout will occupy the positions of item li in the initial layout. 
In other words, the number of copies of items (i, . . . , Ik-i are decreased 
slightly, while the number of copies of item Ik is increased significantly. 

ii. Enlarging for items with small Si'. Repeat the following times. 
Pick an item s randomly having only one copy in the current layout. For 
each item i that has more than one copy in the current layout, there is a 
probability of 0.5 that item i will randomly give up the space of one of its 
copies, and the space will be allocated to item s in the new layout for the 
next iteration. In other words, if there are k items having more than one 
copy at the beginning of this iteration, then item s is expected to gain | 
copies at the end of the iteration. 

3 . {Find a correspondence) Run different correspondence algorithms given in Section 3 
to match a disk in the initial layout with a disk in the target layout. Now we can find 
the set of source and destination disks for each item. 

4. {Compute a migration schedule) Run different data migration algorithms, and record 
the number of rounds needed to finish the migration. 

User Request Distributions. We generate the number of requests for different data 
objects using Zipf distributions and Geometric distributions. We note that few large- 
scale measurement studies exist for the applications of interest here (e.g., video-on- 
demand systems), and hence below we are considering several potentially interesting 
distributions. Some of these correspond to existing measurement studies (as noted below) 
and others we consider to explore the performance characteristics of our algorithms and 
to further improve the understanding of such algorithms. For instance, a Zipf distribution 
is often used for characterizing people’s preferences. 

Zipf Distribution. The Zipf distribution is defined as follows: Prob(request for movie 
i) = Vi = 1, . . . ,M and 0 < 6» < 1, where c = and 

9 determines the degree of skewness. We assign 0 to be 0 (similar to the measurements 
performed in [2]) and 0.5 in our experiments below. 

Geometric Distribution. We also tried Geometric Distributions in order to investigate 
how a more skewed distribution affects the performance of the data migration algorithm. 
The distribution is defined as follows: Prob(request for movie i) = (1 — pf~^p, Vi = 
1, . . . , M and 0 < p < 1, where we use p set to 0.25 and 0.5 in our experiments below. 

Shuffling methods. We use the following shuffling methods in Step 2(a) above. 

1 . Randomly promote 20% of the items. For each chosen item of rank i, we promote 
it to rank 1 to i — 1, randomly. 

2. Promote the least popular item to the top, and demote all other items by one rank. 
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Parameters and Layout Creation. We now describe the parameters used in the ex- 
periments, namely the number of disks, space capacity, and load capacity. We ran a 
number of experiments with 60 disks. For each correspondence method, user request 
distribution, and shnffling method, we generated 20 inputs (i.e., 20 sets of initial and 
target layouts) for each set of parameters, and ran different data migration algorithms 
on those instances. In the Zipf distribution, we used 6 values of 0 and 0.5, and in the 
Geometric distribntion, we assigned p values of 0.25 and 0.5. 

We tried three different pairs of settings for space and load capacities, namely: (A) 
15 and 40, (B) 30 and 35, and (C) 60 and 150. We obtained these numbers from the 
specifications of modern SCSI hard drives. We show the results of 5 different layout 
creation schemes with different combinations of methods and parameters to create the 
initial and target layouts. (I): Promoting the last item to the top, Zipf distribution (0 = 0); 
(II): Promoting 20% of items, Zipf distribution (0 = 0); (III): Promoting the last item 
to the top, Geometric distribution (p = 0.5); (IV): Initial layout obtained from the Zipf 
distribution (0 = 0), target layout obtained from the method described in Step 2(b)i in 
Section 5 (rotation of items); and (V): Initial layout obtained from the Zipf distribution 
(0 = 0), target layout obtained from the method described in Step 2(b)ii in Section 5 
(enlarging Di for items with small Si). We also tried promoting 20% of items with 
Geometric distribntion. The result is omitted since it gave qualitatively similar results. 

Results 

In the tables we present the average for 20 inputs. Moreover, we present results of two 
representative inpnts individually, to illustrate the performance of the algorithms under 
the same initial and target layouts. This presentation is motivated as follows. The absolute 
performance of each run is largely a function of the differences between the initial and 
the target layouts (and this is true for all algorithms). That is, a small difference is likely 
to result in relatively few rounds needed for data migration, and a large difference is 
likely to result in relatively many rounds needed for data migration. Since a goal of 
this study is to understand the relative performance differences between the algorithms 
described above, i.e., given the same initial and target layouts, we believe that presenting 
the data on a per run basis is more informative. That is, considering the average alone 
somewhat obscnres the characteristics of the different algorithms. 

Different correspondence methods. We first investigate how different correspon- 
dence methods affect the performance of the data migration algorithms. Figures 2 and 3 
show the ratio of the number of rounds taken by different data migration algorithms to 
the lower bounds, averaged over 20 inputs under parameter setting (A). We observed that 
the simple min max matching (1) always returns the same matching as the simple min 
sum matching (2) in all instances we tried. Moreover, using a simpler weight function 
(2) or a more involved one (3) does not seem to affect the behavior in any significant 
way (often these matchings are the same). Thus we only present results using simple 
min max matching, and label this as matching-based correspondence method. 

From the fignres, we found that using a matching-based method is important and 
can affect the performance of all algorithms by up to a factor of 4.4 as compared to a 
bad correspondence, using a random permutation for example. Since direct correspon- 
dence does not perform as well as other weight-based matchings, this also suggests that 
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Fig. 2. The ratio of the number of rounds taken by 
the algorithms to the lower bound (28.1 rounds), 
averaged over 20 inputs, using parameter setting 
(A) and layout creation scheme (I). 
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Fig. 3. The ratio of the number of rounds taken 
by the algorithms to the lower bound (6.0 
rounds), averaged over 20 inputs, using pa- 
rameter setting (A) and layout creation scheme 
(II). 



a good correspondence method is important. However, the performance of direct con'e- 
spondence was reasonable when we promoted the popularity of one item. This can be 
explained by the fact that in this case sliding window obtains a target layout which is 
fairly similar to the initial layout. 

Different data migration algorithms. From the previous section it is reasonable to 
evaluate the performance of different data migration algorithms using only the simple min 
max matching correspondence method. We now compare the performance of different 
variants of the KKW algorithm. Consider algorithm KKW (c) which modifies Step 3 
(where we want to satisfy small Di)\ we found that sending the items from Si to small Di 
directly using edge coloring, without using new sources s', is a much better idea. Even 
though this makes the algorithm an 0{A) approximation algorithm, the performance is 
very good under both the Zipf and the Geometric distributions, since the sources are not 
concentrated on one disk and only a few items require rapid doubling. 

In addition, we thought that making the sets Gi slightly larger by using /? was a 
better idea (i.e., algorithm KKW (a) which modifies Step 2(a)). This reduces the average 
degree of nodes in later steps such as in Step 2(c) and Step 3 where we send the item 
to disks in Di\Gi. However, the experiments shown it usually performs slightly worse 
than the algorithm using j3. 

Consider algorithm KKW (d) which modifies Step 3(a) (where we identify new 
sources we found that the performance of the variant that includes Si, in addition to 
Di, as candidates for the new source s' is mixed. Sometimes it is better than the basic 
KKW algorithm, but more often it is worse. 

Consider algorithm KKW (b) which modifies Step 2(b) (where we send the items 
from the sources Si to Gi); we found that doing a simple broadcast is generally a better 
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Table 1. The ratio of the numher of rounds taken hy the data migration algorithms to the lower 
hounds, using min max matching correspondence method with layout creation scheme (III) (i.e., 
promoting the last item to the top, with user requests following the Geometric distrihution (p — 
0.5)), and under different parameter settings. 



Parameter setting: 
Instance: 




(A) 

2 


Ave 




(B) 

2 


Ave 


KKW (Basic) 


2.000 


2.167 


2.250 


1.611 


1.611 


1.568 


KKW (b) 


1.875 


2.167 


2.167 


1.611 


1.611 


1.568 


KKW (c) 


1.625 


2.000 


2.000 


1.444 


1.222 


1.286 


KKW (a & c) 


1.750 


2.000 


2.050 


1.333 


1.333 


1.296 


KKW (a, b & c) 


1.625 


2.000 


1.917 


1.278 


1.278 


1.261 


KKW (d) 


1.750 


2.333 


2.433 


1.722 


1.500 


1.533 


KKW (b & d) 


1.875 


2.333 


2.483 


1.556 


1.556 


1.538 


Edge Coloring 


3.875 


5.667 


5.617 


2.000 


2.111 


1.980 


Unweighted Matching 


1.000 


1.500 


1.500 


1.111 


1.111 


1.116 


Weighted Matching 


1.000 


1.500 


1.433 


1.056 


1.111 


1.111 


Random Weighted 


1.000 


1.333 


1.367 


1.056 


1.056 


1.010 


Item by Item 


6.250 


12.833 


12.683 


3.667 


3.667 


5.719 



Table 2. The ratio of the numher of rounds taken hy the data migration algorithms to the lower 
hounds, using min max matching correspondence method with layout creation scheme (IV) (i.e., 
target layout obtained from the method described in Step 2(b)i in Section 5 (rotation of items)), 
and under different parameter settings. 



Parameter setting: 
Instance: 




(A) 

2 


Ave 




(B) 

2 


Ave 




(C) 

2 


Ave 


KKW (Basic) 


2.400 


2.000 


1.907 


1.417 


1.444 


1.553 


1.333 


1.250 


1.330 


KKW (b) 


2.200 


2.000 


1.889 


1.417 


1.444 


1.553 


1.333 


1.250 


1.330 


KKW (c) 


2.000 


1.600 


1.722 


1.167 


1.111 


1.289 


1.222 


1.000 


1.107 


KKW (a & c) 


1.800 


2.000 


1.704 


1.250 


1.333 


1.408 


1.222 


1.083 


1.170 


KKW (a, b & c) 


2.000 


2.000 


1.704 


1.333 


1.333 


1.408 


1.222 


1.083 


1.170 


KKW (d) 


2.400 


2.400 


1.963 


1.333 


1.444 


1.513 


1.556 


1.083 


1.348 


KKW (b & d) 


2.200 


2.200 


2.000 


1.250 


1.667 


1.658 


1.556 


1.333 


1.393 


Edge Coloring 


1.800 


2.000 


1.778 


1.000 


1.000 


1.250 


1.222 


1.000 


1.125 


Unweighted Matching 


1.000 


1.000 


1.093 


1.000 


1.000 


1.026 


1.000 


1.000 


1.000 


Weighted Matching 


1.000 


1.200 


1.074 


1.000 


1.000 


1.013 


1.000 


1.000 


1.009 


Random Weighted 


1.000 


1.200 


1.056 


1.000 


1.000 


1.013 


1.000 


1.000 


1.009 


Item by Item 


10.000 


9.800 


8.889 


6.333 


8.111 


9.592 


15.778 


12.000 


12.536 



idea, as we can see from the results for Parameter Setting (A), Instance 1 in Table 1 and 
for Parameter Setting (A), Instance I in Table 2. 

Even though this makes the algorithm an O(logn) approximation algorithm, very 
rarely is the size of max Gi large. Under the input generated by Zipf and Geometric dis- 
tributions, the simple broadcast generally performs the same as the basic KKW algorithm 
since the size of max Gi is very small. 

Out of all the heuristics, the matching-based heuristics perform very well in practice. 
The only cases where they perform extremely badly correspond to hand-crafted (by us) 
bad examples. Suppose, for example, a set of A disks are the sources for A items (each 
disk has all A items). Suppose further that the destination disks also have size A each 
and are disjoint. The result is listed in Figure 5. The KKW algorithm sends each of the 
items to one disk in Di in the very first round. After that, a broadcast can be done in the 
destination sets as they are disjoint, which takes 0(log A) rounds in total. Therefore the 
ratio of the number of rounds used to the lower bound remains a constant. The matching- 
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□ KKW (Basic) 

■ KKW(b) 

□ KKW (c) 

□ KKW (a & c) 

■ KKW(a,b&C) 

□ KKW (d) 

■ KKW(b&d) 

□ Edge Coloring 

■ Unweighted Matching 

■ Weighted Matching 

□ Random Weighted 



Fig. 4. The ratio of the number of rounds taken by the algorithms to the lower bound, averaged 
over 20 inputs, using min max matching correspondence method and parameter setting (A), under 
different layout creation schemes (see Section 5). Note that the order of the bars in each cluster is 
the same as that in the legend. 




Fig. 5. The ratio of the number of rounds taken by the algorithms to the lower bound under the 
worst case. 



based algorithm can take up to A rounds, as it can focus on sending item i at each round 
by computing a perfect matching of size A between the source disks and the destination 
disks for item i in each round. Since any perfect matching costs the same weight in this 
case, the matching focuses on sending only one item in each round. We implemented a 
variant of the weighted matching heuristic to get around this problem by adding a very 
small random weight to each edge in the graph. As we can see from Figure 5, although 
this variant does not perform as well as the KKW algorithm, it performs much better 
than other matching-based data migration algorithms. Moreover, we ran this variant 
with the inputs generated from Zipf and Geometric distributions, and we found that it 
frequently takes the same number of rounds as the weighted matching heuristic. In some 
cases it performs better than the weighted matching heuristic, while in a few cases its 
performance is worse. 

Given the performance of different data migration algorithms illustrated in Figure 4 
and in the tables, the two matching-based heuristics are comparable. Matching-based 
heuristics perform the best in general, followed by the edge-coloring heuristic, followed 
by the KKW algorithms, while processing items one-by-one comes last. The main reason 
why the edge-coloring heuristic performs better than the basic KKW algorithm is that the 
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Table 3. The ratio of the numher of rounds taken hy the data migration algorithms to the lower 
hounds, using min max matching correspondence method with layout creation scheme (V) (i.e., 
target layout obtained from the method described in Step 2(b)ii in Section 5 (enlarging Di for 
items with small Si)), and under different parameter settings. 



Parameter setting: 
Instance: 




(A) 

2 


Ave 




(B) 

2 


Ave 


1 


(C) 

2 


Ave 


KKW (Basic) 


1.000 


1.333 


1.333 


1.000 


1.000 


1.114 


1.167 


1.000 


1.140 


KKW (b) 


1.000 


1.333 


1.222 


1.000 


1.000 


1.114 


1.167 


1.000 


1.140 


KKW (c) 


1.000 


1.333 


1.296 


1.000 


1.000 


1.086 


1.000 


1.000 


1.093 


KKW (a & c) 


1.000 


1.333 


1.296 


1.000 


1.000 


1.086 


1.167 


1.000 


1.116 


KKW (a, b & c) 


1.000 


1.333 


1.185 


1.000 


1.000 


1.086 


1.167 


1.000 


1.093 


KKW (d) 


1.000 


1.667 


1.481 


1.000 


1.500 


1.286 


1.500 


1.750 


1.488 


KKW (b & d) 


1.000 


1.667 


1.481 


1.000 


1.500 


1.286 


1.667 


1.750 


1.512 


Edge Coloring 


1.000 


1.000 


1.148 


1.000 


1.000 


1.057 


1.000 


1.000 


1.047 


Unweighted Matching 


1.000 


1.000 


1.111 


1.000 


1.000 


1.029 


1.000 


1.000 


1.047 


Weighted Matching 


1.000 


1.000 


1.111 


1.000 


1.000 


1.029 


1.000 


1.000 


1.047 


Random Weighted 


1.000 


1.000 


1.111 


1.000 


1.000 


1.029 


1.000 


1.000 


1.047 


Item by Item 


7.000 


5.000 


5.815 


15.000 


7.750 


8.200 


8.833 


11.750 


11.349 



input contains mostly move operations, i.e., the size of Si and Di is at most 2 for at least 
80% of the items. The Zipf distribution does not provide enough cloning operations for 
the KKW algorithm to show its true potential (the sizes of Gt are mostly zero, with one 
or two items having size of 1 or 2). Processing items one hy one is had because we have 
a lot of items (more than 300 in Parameter Setting (A)), but most of the operations can 
be done in parallel (we have 60 disks, meaning that we can support 30 copy operations 
in one round). Under the Zipf distribution, since the sizes of most sets Gi are zero, the 
data migration variant which sends items from Si directly to Di is essentially the same 
as the coloring heuristic. Thus they have almost the same performance. 

Under different input parameters. In additional to the Zipf distribution, we also tried 
the Geometric distribution because we would like to investigate the performance of the 
algorithms under more skewed distributions where more cloning of items is necessary. 
As we can see in Figure 3 and Table 1 , we found that the performance of the coloring 
heuristic is worse than the KKW algorithms, especially when p is large (more skewed) 
or when the ratio of the load capacity to space capacity is high. However, the matching- 
based heuristics still perform the best. 

We also investigated the effect on the performance of different algorithms under a 
higher ratio of the load to space capacities. We found that the results to be qualitatively 
similar, and thus we omit them here. 

Moreover, in the Zipf distribution, we assigned different values of 6, which controls 
the skewness of the distribution, as 0.0 and 0.5. We found that the results are similar in 
both cases. While in the Geometric distribution, a higher value of p (0.5 vs 0.25) gives 
the KKW algorithms an advantage over coloring heuristics as more cloning is necessary. 
Miscellaneous. Tables 2 and 3 show the performance of different algorithms using 
inputs where the target layout is derived from the initial layout, as described in Step 2(b)i 
and Step 2(b)ii in Section 5. Note that when the initial and target layouts are very similar, 
the data migration can be done very quickly. The number of rounds taken is much fewer 
than the number of rounds taken using inputs generated from running the sliding window 
algorithm twice. This illustrates that it may be worthwhile to consider developing an 
algorithm which takes in an initial layout and the new access pattern, and then derives a 
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target layout, with the optimization of the data migration process in mind. Developing 
these types of algorithms and evaluating their performance characteristics is part of our 
future efforts. 

We compared the performance of different algorithms under two shuffling methods 
described in Section 5. We found the results to be qualitative similar, and thus we omit 
them here due to lack of space. 

We now consider the running time of the different data migration algorithms. Except 
for the matching heuristics, all other algorithms ’ running times are at most 3 CPU seconds 
and often less than 1 CPU second, on a Sun Fire V880 server. The running time of the 
matching heuristics depends on the total number of items. It can be as high as 43 CPU 
seconds when the number of items is around 3500, and it can be lower than 2 CPU 
seconds when the number of items is around 500. 

Final Conclusions. For the correspondence problem question posed in Section 1.1, our 
experiments indicate that weighted matching is the best approach among the ones we 
tried. 

For the data migration problem question posed in Section 1.1, our experiments in- 
dicate that the weighted matching heuristic with some randomness does very well. This 
suggests that perhaps a variation of matching can be used to obtain an 0(1) approx- 
imation as well. Among all variants of the KKW algorithm, letting Si send item i to 
Di directly for the small sets, i.e., variant (c), performs the best. From the above de- 
scribed results we can conclude that under the Zipf and Geometric distributions, where 
cloning does not occur frequently, the weighted matching heuristic returns a schedule 
which requires only a few rounds more than the optimal. All variants of the KKW al- 
gorithm usually take no greater than 10 more rounds than the optimal, when a good 
correspondence method is used. 



References 

1 . E. Anderson, J. Hall, J. Hartline, M. Hobbes, A. Karlin, J. Saia, R. S waminathan and J. Wilkes. 
An Experimental Study of Data Migration Algorithms. Workshop on Algorithm Engineering, 
2001 

2. A. L. Chervenak. Tertiary Storage: An Evaluation of New Applications. Ph.D. Thesis, UC 
Berkeley, 1994. 

3. S. Ghandeharizadeh and R. R. Muntz. Design and Implementation of Scalable Continuous 
Media Servers. Parallel Computing Journal, 24(1):91-122, 1998. 

4. L. Golubchik, S. Khanna, S. Khuller, R. Thurimella and A. Zhu. Approximation Algorithms 
for Data Placement on Parallel Disks. Proc. of ACM-SIAM SODA, 2000. 

5. J. Hall, J. Hartline, A. Karlin, J. Saia and J. Wilkes. On Algorithms for Efficient Data Migration. 
Proc. of ACM-SIAM SODA, 620-629, 2001. 

6. S. Khuller, Y. Kim and Y-C. Wan. Algorithms for Data Migration with Cloning. 22nd ACM 
Symposium on Principles of Database Systems (PODS), 27-36, 2003. 

7. H. Shachnai andT. Tamir. On two class-constrained versions of the multiple knapsack problem. 
Algorithmica, 29:442^67, 2001. 

8. J. Wolf, H. Shachnai and P. Yu. DASD Dancing: A Disk Load Balancing Optimization Scheme 
for Video-on-Demand Computer Systems. ACM SIGMETRICS/Performance Conf, 157-166, 
1995. 



Classroom Examples of Robustness Problems in 
Geometric Computations* 

(Extended Abstract) 



Lutz Kettner^, Kurt Mehlhorn^, Sylvain Pion^, Stefan Schirra^, and Chee Yap^ 



^ MPI fiir Informatik, Saarbriicken, {ketter,mehlhorn}@mpi-sb. mpg.de 
^ INRIA Sophia Antipolis, Sylvain.Pion@sophia.inria.fr 
^ Otto-von-Guericke-Universitat, Magdeburg, stschirrSisg . cs . uni-magdeburg . de 
New York University, New York, USA, yap@cs . nyu . edu 



Abstract. The algorithms of computational geometry are designed for a ma- 
chine model with exact real arithmetic. Substituting floating point arithmetic 
for the assumed real arithmetic may cause implementations to fail. Although 
this is well known, there is no comprehensive documentation of what can 
go wrong and why. In this extended abstract, we study a simple incremen- 
tal algorithm for planar convex hulls and give examples which make the al- 
gorithm fail in all possible ways. We also show how to construct failure- 
examples semi-systematically and discuss the geometry of the floating point 
implementation of the orientation predicate. We hope that our work will 
be useful for teaching computational geometry. The full paper is available 
at www.mpi-sb.mpg.de/~mehlhorn/ftp/ClassRoomExamples.ps. It con- 
tains further examples, more theory, and color pictures. We strongly recommend 
to read the full paper instead of this extended abstract. 



1 Introduction 

The algorithms of computational geometry are designed for a machine model with exact 
real arithmetic. Substituting floating point arithmetic for the assumed real arithmetic may 
cause implementations to fail. Although this is well known, it is not common knowledge. 
There is no paper which systematically discusses what can go wrong and provides simple 
examples for the different ways in which floating point implementations can fail. Due 
to this lack of examples, instructors of computational geometry have little material for 
demonstrating the inadequacy of floating point arithmetic for geometric computations, 
students of computational geometry and implementers of geometric algorithms still 
underestimate the seriousness of the problem, and researchers in our and neighboring 
disciplines, e.g., numerical analysis, still believe, that simple approaches are able to 
overcome the problem. 

In this extended abstract, we study a simple incremental algorithm for planar convex 
hulls and show how it can fail and why it fails when executed with floating point arith- 
metic. The convex hull CH{S) of a set S of points in the plane is the smallest convex 
polygon containing S. A point p G S' is called extreme in S if CH{S) ^ CH{S\p). 

* Partially supported by the 1ST Programme of the EU under Contract No 1ST- 2000-26473 
(Effective Computational Geometry for Curves and Surfaces (ECG). 



S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 702-713, 2004. 
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The extreme points of S form the vertices of the convex hull polygon. Convex hulls can 
be constructed incrementally. One starts with three non-collinear points in S and then 
considers the remaining points in arbitrary order. When a point is considered and lies 
inside the current hull, the point is simply discarded. When the point lies outside, the 
tangents to the current hull are constructed and the hull is updated appropriately. We give 
a more detailed description of the algorithm in Section 4 and the complete C++ program 
in the full paper. 

Figures 2 and 5 show point sets (we give the numerical coordinates of the points in 
Section 4) and the respective convex hulls computed by the floating point implementation 
of our algorithm. In each case the input points are indicated by small circles, the computed 
convex hull polygon is shown in light grey, and the alleged extreme points are shown 
as filled circles. The examples show that the implementation may make gross mistakes. 
It may leave out points which are clearly extreme, it may compute polygons which are 
clearly non-convex, and it may even run forever (example not shown here). 

The first contribution of this paper is to provide a set of instances that make the floating 
point implementation fail in disastrous ways. The computed results do not resemble the 
correct results in any reasonable sense. 

Our second contribution is to explain why these disasters happen. The correctness of 
geometric algorithms depends on geometric properties, e.g., a point lies outside a convex 
polygon if and only if it can see one of the edges. We give examples, for which a floating 
point implementation violates these properties: a point outside a convex polygon which 
sees no edge (in a floating point implementation of “sees”) and a point not outside which 
sees some edges (in a floating point implementation of “sees”). We give examples for 
all possible violations of the correctness properties of our algorithm. 

Our third contribution is to show how difficult examples can be constructed system- 
atically or at least semi-systematically. This should allow others to do similar studies. 

We believe that the paper and its companion web page will be useful in teaching 
computational geometry, and that even experts will find it surprising and instructing in 
how many ways and how badly even simple algorithms can be made to fail. The compan- 
ion web page (http://www.mpi-sb.mpg.de/~kettner/proj/NonRobust/) contains the 
source code of all programs, the input files for all examples, and installation procedures. 
It allows the reader to perform our and further experiments. 

This paper is structured as follows. In Section 2 we discuss the ground rules for 
our experiments, in Section 3 we study the effect of floating point arithmetic on one 
of the most basic predicates of planar geometry, the orientation predicate, in Section 4 
we discuss an incremental algorithm for planar convex hulls. Finally, Section 5 offers 
a short conclusion. In the full paper, we also discuss the gift wrapping algorithm for 
planar convex hulls and an incremental algorithm for 3d Delaunay triangulations, and 
also give additional theory. 

Related Work: The literature contains a small number of documented failures due 
to numerical imprecision, e.g., Forrest’s seminal paper on implementing the point-in- 
polygon test [For85], Shewchuk’s example for divide-and-conquer Delaunay triangu- 
lation [She97], Ramshaw’s braided lines [MN99, Section 9.6.2], Schirra’s example for 
convex hulls [MN99, Section 9.6.1], and the sweep line algorithm for line segment 
intersection and boolean operations on polygons [MN99, Sections 10.7.4 and 10.8.3]. 
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2 Ground Rules for Our Experiments 

Our codes are written in C++ and the results are reproducible on any platform compliant 
with IEEE floating-point standard 784 for double precision (see [Gol90]), and also with 
other programming languages. All programs and input data can be found on the com- 
panion web page. Numerical computations are based on IEEE arithmetic. In particular, 
numbers are machine floating point numbers (called doubles for short). Such numbers 
have the form ±m2® where m = 1 .toiTO 2 . . . 7+152 irrii = {0, 1}) is the mantissa in binary 
and e is the exponent satisfying —1024 < e < 1024. The results of arithmetic operations 
are rounded to the nearest double (with ties broken using some fixed rule). 

Our numerical example data will be written in decimals (for human consumption). 
Such decimal values, when read into the machine, are internally represented by the 
nearest double. We have made sure that our data can be represented exactly by doubles 
and their conversion to binary and back to decimal is the identity operation. 



3 The Orientation Predicate 



Three points p, q, and r in the plane either lie on a common line or form a left or right 
turn. The triple (p, q, r) forms a left (right) turn, if r lies left (right) of the line through p 
and q and oriented in the direction from p to q. Analytically, the orientation of the triple 
(p, q, r) is tantamount to the sign of a determinant: 



orientation{p,q,r) = sign{det 



Px Py 1 
Qx Qy 1 

rx Ty 1 



), 



(1) 



where p = (px,Py), etc. We have orientation{p,q,r) = 4-1 (resp., —1,0) iff the polyline 
(p,q,r) represents a left turn (resp., right turn, collinearity). Interchanging two points 
in the triple changes the sign of the orientation. We denote the x-coordinate of a point p 
also by x{p). We implement the orientation predicate in the straightforward way: 



orientation{p,q,r) = sign{{qx - Px){ry - Py) - {qy-Py){rx-Px))- (2) 



When the orientation predicate is implemented in this obvious way and evaluated with 
floating point arithmetic, we call it float -orient{p,q,r) to distinguish it from the ideal 
predicate. Since floating point arithmetic incurs round-off error, there are potentially 
three ways in which the result of float orient could differ from the correct orientation: 

- rounding to zero: we mis-classify a -f or — as a 0; 

- perturbed zero: we mis-classify 0 as -f or — ; 

- sign inversion: we mis-classify a -f as — or vice-versa. 



3.1 The Geometry of Float-Orientation 

What is the geometry of float _orient, i.e., which triples of points are classified as left- 
turns, right-turns, or straight? The following type of experiment answers the question: 
We choose three points pi, P 2 , and pa and then compute 

float _orient{{x{pi) + x ■ u,y{pi) + y ■ u),p 2 ,Ps) 
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Fig. 1. The weird geometry of the float-orientation predicate: The figure shows the results of 
float .orient {(x{pi) -I- a: • u, y{pi) + y ■ u,p 2 ,Ps) for 0 < x,y < 255, where u is the increment 
between adjacent floating point numbers in the considered range. The result is color coded: White 
(light grey, dark grey, resp.) pixels represent collinear (negative, positive, resp.) orientation. The 
line through p 2 and P 3 is also shown. 



for 0 < x,y < 255, where u is the increment between adjacent floating point numbers 
in the considered range; for example, u = if x{pi) = 1/2 and u = 4 - 2“®^ if 
x{pi) = 2 = 4 • 1/2. We visualize the resulting 256 x 256 array of signs as a 256 x 256 
grid of pixels: A white (light grey, dark grey) pixel represents collinear (negative, positive, 
respectively) orientation. In the figures in this section we also indicate an approximation 
of the line through p 2 and p^. 

Figure 1(a) shows the result of our first experiment: We use the line defined by the 
points p 2 = (12, 12) and p^ = (24,24) and query it near pi = (0.5, 0.5). We urge the 
reader to pause for a moment and to sketch what he/she expects to see. The authors 
expected to see a white band around the diagonal with nearly straight boundaries. Even 
for points with such simple coordinates the geometry of float. orient is quite weird: the 
set of white points (= the points classified as on the line) does not resemble a straight 
line and the sets of light grey or dark grey points do not resemble half-spaces. We even 
have points that change the side of the line, i.e., are lying left (right) of the line and being 
classified as right (left) of the line. 

In Figures 1(b) and (c) we have given our base points pi more complex coordinates 
by adding some digits behind the binary point. This enhances the cancellation effects in 
the evaluation of float. orient and leads to even more striking pictures. In (b), the light 
grey region looks like a step function at first sight. Note however, it is not monotone, 
has white rays extending into it, and light grey lines extruding from it. The white region 
(= on-region) forms blocks along the line. Strangely enough, these blocks are separated 
by dark and light grey lines. Finally, many points change sides. In Figure (c), we have 
white blocks of varying sizes along the diagonal, thin white and partly light grey lines 
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extending into the dark grey region (similarly for the light grey region), light grey points 
(the left upper corners of the white structures extending into the dark grey region) deep 
inside the blue region, and isolated white points almost 100 units away from the diagonal. 

All diagrams in Figure 1 exhibit block structure. We now explain why. Assume 
we keep y fixed and vary only x. We evaluate float _orient{{x{pi ) + x ■ u,y{pi) + y ■ 
u),p2,P3) for 0 < a; < 255, where u is the increment between adjacent floating point 
numbers in the considered range. Also orientation{p, q,r) = sign{{qx —Px){i"y —py) — 
{qy —Py){rx —px))- We incur round-off error in the additions/subtractions and also in 
the multiplications. Consider first one of the differences, say qx—Px- In (a), we have 
q^ « 12 and « 0.5. Since 12 has four binary digits, we loose the last four bits of x 
in the subtraction, in other words, the result of the subtraction q^ —px is constant for 2'^ 
consecutive values of x. Because of rounding to nearest, the intervals of constant value 
are [8,23], [24, 39], [40, 55] .... Similarly, the floating point result of Tx —Px is constant 
for 2® consecutive values of x. Because of rounding to nearest, the intervals of constant 
value are [16,47], [48, 69], ... . Overlaying the two progressions gives intervals [16, 23], 
[24,39], [40,47], [48,55],... and this explains the structure we see in the rows of (a). We 
see short blocks of length 8, 16, 24, ... in (a). In (b) and (c), the situation is somewhat 
more complicated. It is again true that we have intervals for x, where the results of the 
subtractions are constant. However, since q and r have more complex coordinates, the 
relative shifts of these intervals are different and hence we see narrow and broad features. 



4 The Convex Hull Problem 



We discuss a simple incremental convex hull algorithm. We describe the algorithm, state 
the underlying geometric assumptions, give instances which violate the assumptions 
when used with floating point arithmetic, and Anally show which disastrous effects 
these violations may have on the result of the computation. 

The incremental algorithm maintains the current convex hull CH of the points seen so 
far. Initially, CH is formed by choosing three non-collinear points in S. It then considers 
the remaining points one by one. When considering a point r, it first determines whether 
r is outside the current convex hull polygon. If not, r is discarded. Otherwise, the hull 
is updated by forming the tangents from r to CH and updating CH appropriately. The 
incremental paradigm is used in Andrew’s variant [And79] of Graham’s scan [Gra72] 
and also in the randomized incremental algorithm [CS89]. 

The algorithm maintains the current hull as a circular list L = (uq,ui, . . . ,Vk-i) of 
its extreme points in counter-clockwise order. The line segments {vi,Vi+i),0 <i < k — 1 
(indices are modulo k) are the edges of the current hull. If orientation{vi,Vi+i,r) < 0, 
we say r sees the edge and say the edge (vi,Vi+i) is visible from r. If 

orientation{vi,Vi^i,r) < 0, we say that the edge (wj,Wj_|_i) is weakly visible from r. 
After initialization, k > 3. The following properties are key to the operation of the 
algorithm. 

Property A: A point r is outside CH iff r can see an edge of CH. 

Property B: If r is outside CH, the edges weakly visible from r form a non-empty 
consecutive subchain; so do the edges that are not weakly visible from r. 
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If (vi,Vi+i), , (vj-i,Vj) is the subsequence of weakly visible edges, the up- 
dated hull is obtained by replacing the subsequence (wi+i, . . . ,Vj-i) by r. The subse- 
quence {vi,... ,Vj) is taken in the circular sense. E.g., if i > j then the subsequence is 
{vi, . . . ,Vk-i,vo, . . . ,Vj). From these properties, we derive the following algorithm; 



Initialize L to the counter-clockwise triangle (a,b,c). Remove a,b,c from S. 

for all r G S' do 

if there is an edge e visible from r then 

Compute the sequence (vi,... ,vj) of edges that are weakly visible from r; 
Replace the subsequence (fj+i, . . . ,Vj-i) by r; 

end if 
end for 



To turn the sketch into an algorithm, we provide more information about the substeps. 

1 . How does one determine whether there is an edge visible from r? We iterate over the 
edges in L, checking each edge using the orientation predicate. If no visible edge is 
found, we discard r. Otherwise, we take any one of the visible edges as the starting 
edge for the next item. 

2 . How does one identify the subsequence (wi, . . . ,Vj)l Starting from a visible edge e, 
we move counter-clockwise along the boundary until a non-weakly-visible edge is 
encountered. Similarly, move clockwise from e until a non-weakly-visible edge is 
encountered. 

3 . How to update the list L? We can delete the vertices in . . ,Vj-\) after all 

visible edges are found, as suggested in the above sketch (“the off-line strategy”) 
or we can delete them concurrently with the search for weakly visible edges (“the 
on-line strategy”). 

There are four logical ways to negate Properties A and B: 

• Failure (Ai): A point outside the current hull sees no edge of the current hull. 

• Failure (A2): A point inside the current hull sees an edge of the current hull. 

• Failure (Bi ): A point outside the current hull sees all edges of the convex hull. 

• Failure (B2) : A point outside the current hull sees a non-contiguous set of edges. 

Failures (Ai) and (A2) are equivalent to the negation of Property A. Similarly, Failures 
(Bi) and (B2) are complete for Property B if we take (Ai) into account. Are all these 
failures realizable? We now affirm this. 

4.1 Single Step Failures 

We give instances violating the correctness properties of the algorithm. More precisely, 
we give sequences pi , P2,P3 , ... of points such that the first three points form a counter- 
clockwise triangle {and float -orient correctly discovers this) and such that the insertion 
of some later point leads to a violation of a correctness property (in the computations 
with doubles). We also discuss how we arrived at the examples. All our examples involve 
nearly or truly collinear points; instances without nearly collinear or truly collinear points 
cause no problems in view of the error analysis of the preceding section (omitted in this 
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Fig. 2. (a) The convex hull illustrating Failure (Ai). The point in the lower left corner is left out 
of the hull, (b) schematically indicates the ridiculous situation of a point outside the current hull 
and seeing no edge of the hull: x lies to the left of all sides of the triangle (p, q, r). 



extended abstract). Does this make our examples unrealistic? We believe not. Many 
point sets contain nearly collinear points or truly collinear points which become nearly 
collinear by conversion to floating point representation. 

(Al) A point outside the current hull sees no edge of the cu rrent hull: Consider the set 
of points below. Figure 2(a) shows the computed convex hull, where a point which is 
clearly extreme was left out of the hull. 

Pi = (7.3000000000000194,7.3000000000000167) 

P2 = (24.000000000000068,24.000000000000071) 

P3 = (24.00000000000005,24.000000000000053) 

Pi = (0.50000000000001621,0.50000000000001243) 

P5 = (8.4) P6 = (4,9) P7 = (15,27) 

P8 = (26,25) P9 = (19,11) 

What went wrong? The culprits are the first four points. They lie almost on the line 
y = x, and float ^orient gives the results shown above. Only the last evaluation is wrong, 
indicated by “( ! ! )”. Geometrically, these four evaluations say that pi sees no edge of the 
triangle {pi,P 2 ,Pz)- Figure 2(b) gives a schematic view of this ridiculous situation. The 
points p^, ... pg are then correctly identified as extreme points and are added to the hull. 
Flowever, the algorithm never recovers from the error made when considering pi and 
the result of the computation differs drastically from the correct hull. 

We next explain how we arrived at the instance above. Intuition told us that an 
example (if it exists at all) would be a triangle with two almost parallel sides and with a 
query point near the wedge defined by the two nearly parallel edges. In view of Figure 1 
such a point might be mis-classified with respect to one of the edges and hence would be 
unable to see any edge of the triangle. So we started with the points used in Figure 1(b), 
i.e.. Pi Ri (17, 17), p 2 ~ (24, 24) Ri pg, where we moved p 2 slightly to the right so as to 
guarantee that we obtain a counter-clockwise triangle. We then probed the edges incident 
to Pi with points pi in and near the wedge formed by these edges. Figure 3(a) visualizes 
the outcomes of the two relevant orientation tests. Each black pixel (not lying on the line 
indicating the nearly parallel edges of the triangle) is a candidate for Failure (Ai). The 



float_orient{pi,p2,P3) > 0, 
float .orient {pi,p 2 , Pi) > 0 , 
float _orient{p2,P3, Pi) > 0, 

float_orient{p 3 ,pi,pi) > 0 (!!). 
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Pi : (17.300000000000001,17.300000000000001) 
P2 ■■ (24.000000000000068,24.000000000000071) 
P3 : (24.00000000000005,24.000000000000053) 
P4 : (0.5, 0.5) 




(7.3000000000000194,7.3000000000000167) 
(24.000000000000068,24.000000000000071) 
(24.00000000000005,24.000000000000053) 
(0.5, 0.5) 



(a) (b) 

Fig. 3. The points (pi,P 2 ,P 3 ) form a counter-clockwise triangle and we are interested in the 
classification of points (x(p4) + xu,y{p4) +yu) with respect to the edges (pi,P2) and (p3,pi) 
incident to pi . The extensions of these edges are indistinguishable in the pictures and are drawn as 
a single black line. The black points not lying on the black diagonal do not “float-see” either one 
of the edges (Failure Ai). Points collinear with one of the edges are middle grey, those collinear 
with both edges are light grey, those classified as seeing one but not the other edge are white, and 
those seeing both edges are dark grey, (a) Example starting from points in Figure 1 . (b) Example 
that achieves “rohustness” with respect to the first three points. 



example obtained in this way was not completely satisfactory, since some orientation 
tests on the initial triangle {pi,P2,P3) were evaluating to zero. 

We perturbed the example further, aided by visualizmgfloat_orient{pi,p2,P3), until 
we found the example shown in (b); by our error analysis, this test incurs the largest error 
among the six possible ways of performing the orientation test for three points and is 
hence most likely to return the incorrect result. The final example has the nice property 
that all possible float -orient tests on the first three points are correct. So this example 
is pretty much independent from any conceivable initialization an algorithm could use 
to create the first valid triangle. Figure 3 (b) shows the outcomes of the two orientations 
tests for our final example. 



(A2) A point inside the current hull sees an edge of the current hull: Such examples 
are plenty. We take any counter-clockwise triangle and chose a fourth point inside the 
triangle but close to one of the edges. By Figure 1 there is the chance of sign reversal. A 
concrete example follows: 

float -Orient {pi,p2,P3) > 0 



Pi = (27.643564356435643,-21.881188118811881) 
P2 = (83.366336633663366, 15.544554455445542) 
P3 = (4,4) 

P4 = (73.415841584158414,8.8613861386138595) 



float -Orient {pi,p2,P4) < 0 (!!) 
float -Orient {p 2 ,P 3 ,p 4) > 0 
float -Orient (p3,pi,p4) > 0 
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Fig. 4. Schematics: The point p4 sees all edges of the triangle {pi,P2,P3)- 



The convex hull is correctly initialized to {pi,P2,P3)- The point p4 is inside the 
current convex hull, but the algorithm incorrectly believes that p4 can see the edge 
(pi,P2) and hence changes the hull to {pi,P4,P2,P3), a slightly non-convex polygon. 



(Bi) A point outside the current hull sees all edges of the convex hull: Intuition told 
us that an example (if it exists) would consist of a triangle with one angle close to tt 
and hence three almost parallel sides. Where should one place the query point? We first 
placed it in the extension of the three parallel sides and quite a distance away from the 
triangle. This did not work. The choice which worked is to place the point near one of 
the sides so that it could see two of the sides and “float-see” the third. Figure 4 illustrates 
this choice. A concrete example follows: 



Pi = (200,49.200000000000003) 

P2 = (100,49.600000000000001) 

P3 = (-233.33333333333334,50.93333333333333) 
P4 = (166.66666666666669,49.333333333333336) 



float _orient{pi,p2,P3) > 0 
float _orient{pi,p2,P4) < 0 
float _orient{p2,P3,P4) < 0 
float _orient{p^,pi,P 4 ) < 0 (!!) 



The first three points form a counter-clockwise oriented triangle and according to 
float .orient , the algorithm believes that p4 can see all edges of the triangle. What will our 
algorithm do? It depends on the implementation details. If the algorithm first searches 
for an invisible edge, it will search forever and never terminate. If it deletes points on-line 
from L it will crash or compute nonsense depending on the details of the implementation. 



(B2) A point outside the current hull sees a non-contiguous set of edges: Consider the 
following points: 



Pi = (0.50000000000001243,0.50000000000000189) 
P2 = (0.50000000000001243,0.50000000900000333) 
P3 = (24.00000000000005,24.000000000000053) 

P4 = (24.000000000000068,24.000000000000071) 

P5 = (17.300000000000001,17.300000000000001) 



float .orient (pi,p 4 ,p^) < 0 (!!) 
float. orient {p 4 ,p^,pil) > 0 
float. orient {p^,P 2 ,pf) < 0 
float. orient (p2,pi,P3) > 0 



Inserting the first four points results in the convex quadrilateral {pi,p4,p^,p2)', this 
is correct. The last point p^ sees only the edge {p3,P2) and none of the other three. 
However, float .orient makes p^ see also the edge (pi,P4). The subsequences of visible 
and invisible edges are not contiguous. Since the falsely classified edge {pi,P4) comes 
first, our algorithm inserts ps at this edge, removes no other vertex, and returns a polygon 
that has self-intersections and is not simple. 




Classroom Examples of Robustness Problems in Geometric Computations 



711 




Fig. 5. (a) The hull constructed after processing points pi to p^. Points pi and P 5 lie close to each 
other and are indistinguishable in the upper figure. The schematic sketch helow shows that we 
have a concave corner at p^ . The point pq sees the edges (pi , P2 ) and (p4 , ps ) , but does not see 
the edge (p5,pi). One of the former edges will be chosen by the algorithm as the chain of edges 
visible from pg. Depending on the choice, we obtain the hulls shown in (b) or (c). In (b), (p4,P5) 
is found as the visible edge, and in (c), (pi,P2) is found. We refer the reader to the text for further 
explanations. The figures show the coordinate axes for orientation. 



4.2 Global Effects 



By now, we have seen examples which cover the negation space of the correctness 
properties on the incremental algorithm and we have seen the effect of an incorrect 
orientation test for a single update step. We next study global effects. The goal is to 
refute the myth that the algorithm will always compute an approximation of the true 
convex hull. 



The algorithm computes a convex polygon, but misses some of the extreme points: We 
have already seen such an example in Failure (Ai). We can modify this example so that 
the ratio of the areas of the true hull and the computed hull becomes arbitrarily large. 
We do as in Failure (Ai), but move the fourth point to infinity. The true convex hull has 
four extreme points. The algorithm misses <74. 



qi = (0.10000000000000001,0.10000000000000001) 
q2 = ( 0 . 20000000000000001 , 0 . 20000000000000004 ) 
?3 = ( 0 . 79999999999999993 , 0 . 80000000000000004 ) 



float -Orient {qi , 52 , ? 3 ) < 0 
float -Orient [qi,q 2 ,q 4 ) = 0 (!!) 
float -Orient [q2,q3,q4) = Q (!!) 



q4 = ( 1 . 267650600228229 - 10 ^°, 1 . 2676506002282291 - 10 ^°) float-Orient{q3,qi,q4)>0 



The algorithm crashes or does not terminate: See Failure (Bi). 
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The algorithm computes a non-convex polygon: We have already given such an example 
in Failure (A 2 ). However, this failure is not visible to the naked eye. We next give 
examples where non-convexity is visible to the naked eye. We consider the points: 

Pi = (24.00000000000005, 24.000000000000053) P2 = (24.0, 6.0) 

P3 = (54.85,6.0) P4 = (54.850000000000357,61.000000000000121) 

P5 = (24.000000000000068, 24.000000000000071) Pe = (6, 6) . 

After the insertion of pi to p^, we have the convex hull {pi,p 2 ,p^,P 4 ). This is correct. 
Point p 5 lies inside the convexhullofthefirstfourpoints;but^oat_oncMt(p 4 ,pi,p 5 ) < 0. 
Thus p 5 is inserted between p 4 and pi. and we obtain {pi,P 2 ,P 3 ,P 4 ,P 5 )- However, this 
error is not visible yet to the eye, see Figure 5(a). 

The point pg sees the edges ( 754 ,^ 5 ) and {pi,P 2 ), but does not see the edge {p 5 ,pi). 
All of this is correctly determined by float _orient . Consider now the insertion process 
for point pg. Depending on where we start the search for a visible edge, we will either 
find the edge (p 4 ,ps) or the edge (pi,p 2 )- In the former case, we insert pg between p 4 
and p 5 and obtain the polygon shown in (b). It is visibly non-convex and has a self- 
intersection.ln the latter case, we insert pg between pi and p 2 and obtain the polygon 
shown in (c). It is visibly non-convex. 

Of course, in a deterministic implementation, we will see only one of the errors, 
namely (b). This is because in our example, the search for a visible edge starts at edge 
(p 2 ,P 3 ). In order to produce (c) with the implementation we replace the point p 2 by the 
point P 2 = (24.0,10.0). Thenpg sees (p 2 ,Pa) and identifies (pi,P 2 ,P 3 ) as the chain of 
visible edges and hence constructs (c). 

5 Conclusion 

We provided instances which cause the floating point implementation of a simple convex 
hull algorithm to fail in many different ways. We showed how to construct such instances 
semi-systematically. We hope that our paper and its companion web page will be useful 
for classroom use and that it will alert students and researchers to the intricacies of 
implementing geometric algorithms. 

What can be done to overcome the robust problem of floating point arithmetic? 
There are essentially three approaches: (1) make sure that the Implementation of the 
orientation predicate always returns the correct result or ( 2 ) change the algorithm so 
that it can cope with the floating point implementation of the orientation predicate and 
still computes something meaningful or (3) perturb the input so that the floating point 
implementation is guaranteed to produce the correct result on the perturbed input. The 
first approach is the most general and is known under the the exact geometric com- 
putation (EGC) paradigm and was first used by Jiinger, Reinelt and Zepf [JRZ91] and 
Karasick, Lieber and Nackmann [KLN91]. For detailed discussions, we refer the reader 
to [Yap04] and Chapter 9 of [MN99]. The EGC approach has been adopted for the soft- 
ware libraries LEDA and CGAL and other successful implementations. The second and 
third approaches have been successfully applied to a number of geometric problems, 
see for example [Mil89,FM91,LM90,DSB92,SIII00,HS98]. But in the second approach 
the interpretation of “meaningful” is a crucial and difficult problem. The third approach 
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should in principle be applicable to all problems with purely numerical input. There are 
also suggested approaches, e.g., epsilon-tweaking, which do not work. Epsilon-tweaking 
simply activates rounding to zero, both for correct and mis-classihed orientations. E.g., 
it is now more likely for points outside the current hull not to see any edges because of 
enforced collinearity. 
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Abstract. We introduce a new stable minimum storage algorithm for 
merging that needs 0(m log(^ + 1)) element comparisons, where m and 
n are the sizes of the input sequences with m < n. According to the lower 
bound for merging, our algorithm is asymptotically optimal regarding the 
number of comparisons. 

The presented algorithm rearranges the elements to be merged by rota- 
tions, where the areas to be rotated are determined by a simple principle 
of symmetric comparisons. This style of minimum storage merging is 
novel and looks promising. 

Our algorithm has a short and transparent dehnition. Experimental work 
has shown that it is very efficient and so might be of high practical in- 
terest. 



1 Introduction 



Merging denotes the operation of rearranging the elements of two adjacent sorted 
sequences of sizes m and n, so that the result forms one sorted sequence of m + n 
elements. An algorithm merges two adjacent sequences with minimum storage 
[10] when it needs 0(log^(m -|- n)) bits additional space at most. This form of 
merging represents a weakened form of in-place merging and allows the usage of 
a stack that is logarithmically bounded in m -|- n. Minimum storage merging is 
sometimes also referred to as in situ merging. A merging algorithm is regarded 
as stable, if it preserves the initial ordering of elements with equal value. 

Some lower bounds for merging have been proven so far. The lower bound for the 
number of assignments is m-\-n, because every element may change its position in 
the sorted result. The lower bound for the number of comparisons is 0(mlog 
for m < n. This can be proven by a combinatorial inspection combined with an 
argumentation using decision trees. An accurate presentation of these bounds is 
given by Knuth [10]. 

* This work was supported by the Kookmin University research grant in 2004. 



S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 714-723, 2004. 
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The simple standard merge algorithm is rather inefficient, because it uses lin- 
ear extra space and always needs a linear number of comparisons. In particular 
the need of extra space motivated the search for efficient in-place merging al- 
gorithms. The first publication presenting an in-place merging algorithm was 
due to Kronrod [11] in 1969. Kronrod’s unstable merge algorithm is based on a 
partition merge strategy and uses an internal buffer as central component. This 
strategy has been refined and improved in numerous subsequent publications. A 
selection of these publications is [7,8,9,3,12,15,17,16]; an accurate description of 
the history and evolution of Kronrod-related algorithms can be found in [3] . The 
more recent Kronrod-related publications, like [7] and [3], present quite complex 
algorithms. The correctness of these algorithms is by no means immediately 
clear. 

A minimum storage merging algorithm that isn’t Kronrod-related was proposed 
by Dudzinski and Dydek [5] in 1981. They presented a divide and conquer algo- 
rithm that is asymptotically optimal regarding the number of comparisons but 
nonlinear regarding the number of assignments. This algorithm was chosen as 
basis of the implementation of the merge_without_buffer-function in the C-|— I- 
Standard Template Libraries[2], possibly due to its transparent nature and short 
definition. 

A further method was proposed by Ellis and Markov in [6], where they in- 
troduce a shuffle-based in situ merging strategy. But their algorithm needs 
((n -b m) log(n -b m)) assignments and ((n -b m) log(n -b m)) comparisons. So, 
despite some practical value, the algorithm isn’t optimal from the theoretical 
point of view. 

We present a new stable minimum storage merging algorithm performing 
0(m log comparisons and 0((m-bn) log to) assignments for two sequences of 

size TO and n (to < n). Our algorithm is based on a simple strategy of symmetric 
comparisons, which will be explained in detail by an example. We report about 
some benchmarking showing that the proposed algorithm is fast and efficient 
compared to other minimum storage merging algorithms as well as the standard 
algorithm. We will finish with a conclusion, where we give a proposal for further 
research. 



2 The Symmerge Algorithm 



We start with a brief introduction of our approach to merging. Let us assume 
that we have to merge the two sequences u = (0,2, 5,9) and v = (1,4,7, 8). 
When we compare the input with the sorted result, we can see that in the result 
the last two elements of u occur on positions belonging to v, and the first two 
elements of v appear on positions belonging to u (see Fig. 1 a)). So, 2 elements 
were exchanged between u and v. The kernel of our algorithm is to compute 
this number of side-changing elements efficiently and then to exchange such a 
number of elements. More accurately, if we have to exchange n (n > 0) elements 
between sequences u and v, we move the n greatest elements from u to v and 
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R,ecursioii 1 Recursion 2 



Fig. 1. Symmerge example 



the n smallest elements from v to u, where the exchange of elements is realized 
by a rotation. Then by recursive application of this technique to the arising 
subsequences we get a sorted result. Fig. 1 illustrates this approach to merging 
for our above example. 

We will now focus on the process of determining the number of elements to be 
exchanged. This number can be determined by a process of symmetrical com- 
parisons of elements that happens according to the following principle: 

We start at the leftmost element in u and at the rightmost element in v and 
compare the elements at these positions. We continue doing so by symmetri- 
cally comparing element-pairs from the outsides to the middle. Fig. 1 b) shows 
the resulting pattern of mutual comparisons for our example. There can occur 
at most one position, where the relation between the compared elements alters 
from ’not greater’ to ’greater’. In Figure 1 b) two thick lines mark this position. 
These thick lines determine the number of side-changing elements as well as the 
bounds for the rotation mentioned above. 

Due to this technique of symmetric comparisons we will call our algorithm Sym- 
merge. Please note, if the bounds are on the leftmost and rightmost position, 
this means all elements of u are greater than all elements of v, we exchange u 
and V and get immediately a sorted result. Conversely, if both bounds meet in 
the middle we terminate immediately, because uv is then already sorted. So our 
algorithm can take advantage of the sortedness of the input sequences. 

So far we introduced the computation of the number of side-changing elements as 
linear process of symmetric comparisons. But this computation may also happen 
in the style of a binary search. Then only [log(min(|w| , |u|))J -I- 1 comparisons 
are necessary to compute the number of side-changing elements. 



2.1 Formal Definition 

Let u and v be two adjacent ascending sorted sequences. We define u < v {u < v) 
iS. X < y {x < y) for all elements x G u and for all elements y G v. 

We merge u and v as follows: 

If |u| < |u|, then 
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(al) we decompose v into V\WV2 such that |w| = |u| and either |r;2| = |fi| 

or |z;2| = |t^i| + 1 . 

(a 2 ) we decompose u into U\U2 (|ui| > 0 , \u2\ > 0 ) and w into W\W2 

(|wi| > 0 , |r/;2| > 0 ) such that |ui| = |w2|, \u2\ = let'll and u\ < W2, 

U2 > Wi. 

(a 3 ) we recursively merge ui with viWi as well as U2 with W2V2- Let u' 

and v' be the resulting sequences, respectively. 

else 

(bl) we decompose u into U\WU2 such that Itcl = |t>| and either \u2 \ = |ui| 

or \u2\ = |ui| + 1 . 

(b 2 ) we decompose v into V\V2 (|fi| > 0 , |i>2| > 0 ) and w into wiW2 

(|wi| > 0 , |iC2| > 0 ) such that |t>i| = |w2|, 1^21 = I'iCil and wi < V2, 

W2 > Vi- 

(b 3 ) we recursively merge uiWi with vi as well as W2U2 with V2- Let u' 

and v' be the resulting sequences, respectively. 

u'v' then contains all elements of u and v in sorted order. 

Fig. 2 contains an accompanying graphical description of the process described 
above. The steps (al) and (bl) manage the situation of input sequences of dif- 
ferent length by cutting a subsection w in the middle of the longer sequence 
as “active area” . This active area has the same size as the shorter of either in- 
put sequences. The decomposition formulated by the steps (a 2 ) and (b 2 ) can 
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Fig. 2. Illustration of Symmerge 
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Algorithm 1 Symmerge algorithm 
Symmerge (A, firstl, first2, last) 

if firstl < first2 and first2 < last then 
m ^ {firstl + last)/2 
n ^ m + first2 
if first2 I m then 

start •<— Bsearch (A, n - last, m, n - 1) 

else 

start •<— Bsearch (A, firstl, first2, n - 1) 
end ^ n - start 
Rotate (A, start, first2, end) 

Symmerge (A, firstl, start, m) 

Symmerge (A, m, end, last) 

Bsearch (A, I, r, p) 
while I i r 

m {I + r) / 2 
if A[?7 i] <A[p - m] 
then I -(r- m + 1; 
else r m] 
return I 



be achieved efficiently by applying the principle of the symmetric comparisons 
between the shorter sequence u (or v) and the active area w. After the decompo- 
sition step (a2) (or (b2)), the subsequence M 2 WiWi(or W 2 U 2 V 1 ) is rotated so that 
we get the subsequences UiViWi and U 2 W 2 V 2 {uiWiVi and W 2 U 2 V 2 )- The treat- 
ment of pairs of equal elements as part of the “outer blocks” {ui,u >2 in (a2) and 
wi,V 2 in (b2)) avoids the exchange of equal elements and so any reordering of 
these. 

Corollary 1. Symmerge is stable. 

Algorithm 1 gives an implementation of the Symmerge algorithm in Pseu- 
docode. The Pseudocode conventions are taken from [4]. 



3 Worst Case Complexity 

We will now investigate the worst case complexity of Symmerge regarding the 
number of comparisons and assignments. 

Unless stated otherwise, let us denote m = |u|, n = \v\, m < n, k = [logmj 
and let rrij and n* denote the minimum and maximum of lengths of sequences 
merging on the Ah recursion group for i = 0,1,..., A: and j = 1,2, 3,..., 2® 
(initially TOj* = m and n® = n). A recursion group consists of one or several 
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recursion group 0 
recursion group 1 



1 m — 1 



{mi, rii) = (m, n) 
(1(= l(=m^),4) 




1 (1, nf), (1, ri2)> (b 'J*), ' ■ ' 

Fig. 3. Maximum spanning case 






recursion levels and comprises = 0, 1 , . . . , fc) subsequence mergings at most 
(see figure 3) . In the special case where each subsequence merging always triggers 
two nonempty recursive calls - in this case the recursion depth becomes exactly 
k = [log mj recursion groups and recursion levels are identical, but in general 
for the recursion depth dp it holds k = [logmj < dp < m. Further, for each 
recursion group i = 0,l,...,A:, it holds ~ m + n. 

Lemma 1. ([5] Lemma 3.1) If k = ^ ^ integer i > 0, 

then — 2*log(fc/2*). 



Theorem 1. The Symmerge algorithm needs 0(jnlog{n/m+l)) eomparisons. 

Proof. The number of comparisons for the binary search for the recursion group 0 
is equal to [log m\ +1< [log m + nj +1. For the recursion group 1 we need at most 
log(m[+n[) + l+log(m 2 +n 2 ) + l comparisons, and so on. For the recursion group 
i we need at most log(m* + np + 2® comparisons. Since = 

m+n, it holds Y^j=i log(m* +n^+2® < 2* log((m+n) /2*)+2* by Lemma 1. So the 
overall number of comparisons for all A: + 1 recursion groups is not greater than 
SLo(2* + 2*log((m+n)/2*)) = 2*^+^ - 1 + (2'"+^ - 1) log(m+n) i2L Since 
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SiLo *2* = (fc— 1)2^+^+2, algorithm Symmerge needs at most 2^+^ — l+(2^+^ — 
1) log(m+n) — (/c— 1)2^+^ — 2 < 2^+^ log(m+n) — fc2^+^ + 2*^+^ — log(TO+n)— 3 < 
2m(log + 2) — log(m + n) — 3 = 0(mlog(^ + 1)) comparisons. □ 



Theorem 2. The recursion- depth of Symmerge is bounded by |"log(m + n)~\ . 

Proof. The decomposition steps (la) and (lb) satisfy the property max{|Mi|, 
\u 2 \ , |r’i| , \v 2 \ , |w^i| , |r' 2 |} < (|u| + |?^|)/2. So, on recursion level |"log(m + n)~\ all 
remaining unmerged subsequences are of length 1 . □ 

Corollary 2. Symmerge is a minimum storage algorithm. 

In [5] Dudzinski and Dydek presented an optimal in-place rotation (sequence 
exchange) algorithm, that needs m -|- n -I- gcd(m,n) element assignments for 
exchanging two sequences of lengths m and n. 

Theorem 3. If we take the rotation algorithm given in [5], then Symmerge 
requires 0((to -|- n) log m) element assignments. 

Proof. Inside each recursion group i = 0, 1, . . . , /c disjoint parts of u are merged 
with disjoint parts of v. Hence each recursion group i comprises at most 
+ gcd(m*,n*)) < TO -I- n -I- = 2 to -|- n assignments 

resulted from rotations. So the overall number of assignments for all k recursion 
groups is less than (2TO-|-n)(fc-l-l) = (2TO-|-n) logTO-|-2TO-|-n = 0((TO-|-n) logm). 

□ 



4 Practical Results 

We did some experimental work with the unfolded version of the Symmerge al- 
gorithm (see Sect. 4.1) and compared it with the implementations of three other 
merging algorithms. As first competitor we chose the merge .without .buffer- 
function contained in the C-I--I- Standard Template Libraries (STL) [2]. This 
function implements a modified version of the Recmerge algorithm devised 
by Dudzinski and Dydek [5]. The STL-algorithm operates in a mirrored style 
compared to Recmerge and doesn’t release one element in each recursion step. 
Fig. 4 gives a graphical description of the differences. We preferred the STL- 
variant because of its significance in practice. 

The second competitor was taken from [14] , where a simplified implementation of 
the in-place algorithm from Mannila & Ukkonen [12] is given. Unlike the original 
algorithm the simplified version relies on a small external buffer whose length is 
restricted by the square root of the input size. 

As third competitor we took the classical standard algorithm. We chose ran- 
domly generated sequences of integers as input. The results of our evaluation 
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Fig. 4. Recmerge versus STL-merge_without_buffer 



Table 1. Practical comparison of various merging algorithms 



n 


m 


Symmerge 


^'^^Tnodi f ied 


STLorig 


Mannila & Ukkonen 


Standard 






^comp 


te 


4^comp 


te 


te 


44 comp 


te 


44comp 


te 






5217738 


1831 


6020121 


2263 


3408 


7464568 


1483 


4194177 


143 


221 


218 


1348902 


635 


1541524 


751 


1398 


5615225 


1116 


2359159 


415 


221 


215 


264279 


298 


295063 


334 


815 


5301311 


1049 


2129765 


372 


221 


212 


45227 


211 


49272 


231 


601 


4660337 


942 


2100744 


313 




2^ 


8198 


680 


8711 


111 


2135 


14909001 


3101 


8374604 


1503 


223 


2® 


1212 


519 


1279 


613 


1254 


14297482 


2996 


8292870 


1496 


223 


2® 


170 


465 


179 


548 


196 


14202829 


2961 


7544491 


1402 


223 


2° 


23 


191 


24 


222 


31 


14188252 


2959 


4040739 


832 



te : Execution time in ms, #co?np : Number of comp., m,n : Lengths of inp. seq. 



are contained in Table 1, each entry shows a mean value of 30 runs with differ- 
ent data. We took a state of the art hardware platform with 2.4 Ghz processor 
speed and 512MB main memory; all coding was done in the C-programming 
language. Each comparison was accompanied by a slight artificial delay in order 
to strengthen the impact of comparisons on the execution time. 

It can be read from the table that the STL-algorithm seems to be less efficient 
than Symmerge. However both algorithm show an accurate “0(mlog(n/m))- 
behavior” . 
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Symmerge and Recmerge rely on rotations as encapsulated operations for el- 
ement reordering. There are several algorithms that achieve rotations, three of 
them are presented and evaluated by Bentley in [1]. The evaluation by Bentley 
showed, that a rotation algorithm proposed by Dudzinski and Dydek in [5] is 
rather slow compared to its alternatives, although it is optimal with respect to 
the number of assignments. Nevertheless this algorithm was chosen in the STL 
for the implementation of rotations. We exchanged the rotation algorithm of the 
merge -without -buffer- function (STL) by one of its faster alternatives and com- 
pared the modified function with its original. The result of this comparisons is 
given in the columns STLmodi/ied and STLo^g of Table 1. From this comparison 
it can be read that the chosen rotation algorithm is one of the driving factors 
regarding the performance of Recmerge, and so of Symmerge. 



4.1 Optimisations on Pseudocode-Level 

We would like to remark two optimisations of Symmerge on Pseudocode-Level 
that can be applied for decreasing the execution time without changing the num- 
ber of comparisons and element assignments carried out. First of all, “needless 
recursive calls”, i.e. recursive calls with m = 0 or n = 0, can be avoided by 
unfolding the algorithm. For the unfolded version the caller has to ensure that 
Symmerge is called with m > 1 and n > 1. A second optimisation is the treat- 
ment of the case m = 1 and n > 1 as loop-driven direct binary insertion. This 
avoids [logn -I- IJ recursive calls of the algorithm in this special case. 

The impact of both optimizations on the execution time depends on the se- 
quences to be merged. In our environment we could observe a time-reduction up 
to 25%. 

5 Conclusion 

We presented an efficient minimum storage merging algorithm called Sym- 
merge. Our algorithm uses a novel technique for merging that relies on sym- 
metric comparisons as central operation. We could prove that our algorithm is 
asymptotically optimal regarding the number of necessary comparisons. Practi- 
cal evaluation could show that it is fast and efficient. So, Symmerge is not only 
of theoretical but also of practical interest. 

Finally let us note that our algorithm, unlike the standard algorithm, can take 
advantage of the sortedness of the input sequences. If the overall input sequence 
is in sorted order, i.e. u < v for two sequences u and v of sizes m and n, Sym- 
merge needs only 0(log(m -I- n)) comparisons and hence becomes sub- linear. 
This in turn reflects to a Symmerge based Merge-sort, which speeds up for pre- 
sorted sequences as well. Mehlhorn showed in [13] that sequences of size n with 
0{n) inversions, i.e. with 0{n) pairs of elements that are not in sorted order, 
can be sorted in time 0{n). How many comparisons does a Symmerge based 
Merge-sort need depending on the number of inversions? We would like to leave 
this question to the further research. 
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Abstract. A rectangular cartogram is a type of map where every region 
is a rectangle. The size of the rectangles is chosen such that their areas 
represent a geographic variable (e.g., population). Good rectangular car- 
tograms are hard to generate: The area specifications for each rectangle 
may make it impossible to realize correct adjacencies between the regions 
and so hamper the intuitive understanding of the map. 

Here we present the first algorithms for rectangular cartogram construc- 
tion. Our algorithms depend on a precise formalization of region adja- 
cencies and are building upon existing VLSI layout algorithms. Further- 
more, we characterize a non-trivial class of rectangular subdivisions for 
which exact cartograms can be efficiently computed. An implementa- 
tion of our algorithms and various tests show that in practice, visually 
pleasing rectangular cartograms with small cartographic error can be 
effectively generated. 



1 Introduction 



Cartograms. Cartograms, which are 
also referred to as value-by-area maps, 
are a useful and intuitive tool to vi- 
sualize statistical data about a set 
of regions like countries, states or 
provinces. The size of a region in a 
cartogram corresponds to a particu- 
lar geographic variable [3,10]. Since 
the sizes of the regions are not their 
true sizes they generally cannot keep 
both their shape and their adjacen- 
cies. A good cartogram, however, pre- 
serves the recognizability in some way. 

Globally speaking, there are three 
types of cartogram. The standard 




Fig. 1. The population of Europe (country 
codes according to ISO 3611). 



type (the contiguous area cartogram) has deformed regions so that the desired 
sizes can be obtained and the adjacencies kept. Algorithms for such cartograms 
are described in [4,5,7,15]. The second type of cartogram is the non-contiguous 
area cartogram [11]. The regions have the true shape, but are scaled down and 
generally do not touch anymore. The third type of cartogram is the rectangular 
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cartogram introduced by Raisz in 1934 [12] where each region is represented by 
a rectangle. This has the great advantage that the sizes (area) of the regions can 
be estimated much better than with the first two types. However, the rectangular 
shape is less recognizable and it imposes limitations on the possible layout. 

Quality criteria. Whether a rectangular cartogram is good is determined by 
several factors. One of these is the cartographic error [4,5], which is defined for 
each region as \Ac — Hg] j As, where Ac. is the area of the region in the cartogram 
and As is the specified area of that region, given by the geographic variable to 
be shown. The following list summarizes all quality criteria: 

— Average and maximum cartographic error. 

— Correct adjacencies of the rectangles (e.g., the rectangles for Germany and 
France should be adjacent, and the rectangles for Germany and Spain must 
not be adjacent). 

— Maximum aspect ratio. 

— Suitable relative positions (e.g., the rectangle for The Netherlands should be 
West of the one for Germany). 

For a purely rectangular cartogram we cannot ex- 
pect to simultaneously satisfy all criteria well. Figure 2 
shows an example where correct adjacencies must be 
sacrificed in order to get a small cartographic error. 

In larger examples, bounding the aspect ratio of the 
rectangles also results in larger cartographic error. 

Related work. Rectangular cartograms are closely 
related to floor plans for electronic chips and architec- 
tural designs. Floor planning aims to represent a planar graph by its rectangular 
dual, defined as follows. A rectangular partition of a rectangle i? is a partition 
of R into a set TZ of non-overlapping rectangles such no four rectangles in TZ 
meet at the same point. A rectangular dual of a planar graph (G, R) is a rect- 
angular partition TZ, such that (f) there is a one-to-one correspondence between 
the rectangles in TZ and the nodes in G, and {ii) two rectangles in TZ share a 
common boundary if and only if the corresponding nodes in G are connected. 
The following theorem was proven in [9]: 

Theorem 1. A planar graph G has a rectangular dual R with four rectangles 
on the boundary of R if and only if 

1. every interior face is a triangle and the exterior face is a quadrangle 

2. G has no separating triangles. 

Although every triangulated planar graph without separating triangles has a 
rectangular dual this does not imply that an error free cartogram for this graph 
exists. The area specification for every rectangle, as well as other criteria for good 
cartograms, may make it impossible to realize. Only a careful trade-off between 
the various types of errors (for example incorrect areas or incorrect adjacencies) 
makes it possible to visualize certain data sets as rectangular cartograms. 
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Fig. 2. The values inside 
the rectangles indicate the 
preferred areas. 
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The only algorithm for standard cartograms that can be adapted to handle 
rectangular cartograms is Tobler’s pseudo-cartogram algorithm [15] combined 
with a rectangular dual algorithm. Tobler’s pseudo-cartogram algorithm starts 
with a orthogonal grid superimposed on the map, after which the horizontal and 
vertical lines of the grid are moved to stretch and shrink the strips in between. 
Hence, if the input to Tobler’s algorithm is a rectangular partition, then the 
output is also a rectangular partition. However, Tobler’s method is known to 
produce a large cartographic error and is mostly used as a preprocessing step for 
cartogram construction [10]. Furthermore, Tobler himself in a recent survey [14] 
states that none of the existing cartogram algorithms are capable of generating 
rectangular cartograms. 

Results. We present the first fully automated algorithms for the computation 
of rectangular cartograms. We formalize the region adjacencies based on their 
geographic location and are so able to enumerate and process all feasible rectan- 
gular layouts for a particular subdivision (i.e., map). The precise steps that lead 
us from the input data to an algorithmically processable rectangular subdivision 
are sketched in Section 2. 

We describe three algorithms that compute a cartogram from a rectangular 
layout. The first is an easy and efficient heuristic which we evaluated experi- 
mentally. The visually pleasing results of our implementation (which compare 
favorably with existing hand-drawn or computer assisted cartograms) can be 
found in Section 6. Secondly, we show how to formulate the computation of a 
cartogram as a bilinear programming problem (see Section 5) . 

For our third algorithm we introduce an effective generalization of sliceable 
layouts, namely L-shape destructible layouts. We prove in Section 3 that the 
coordinates of an L-shape destructible layout are uniquely determined by the 
specified areas (if an exact cartogram exists at all for a specific set of area values) . 
The proof immediately implies an efficient algorithm to compute an exact (up 
to an arbitrarily small error) cartogram for subdivisions that arise from actual 
maps. 

2 Algorithmic Outline 

Assume that we are given an administrative subdivision into a set of regions. 
The regions and adjacencies can be represented by nodes and arcs of a graph F, 
which is the face graph of the subdivision. 

1. Preprocessing: The face graph F is in most cases already triangulated 
(except for its outer face) . In order to construct a rectangular dual of F we have 
to process internal nodes of degree less than four, and triangulate any remaining 
non-triangular faces. Our algorithms will consider every possible triangulation 
of non-triangular faces, since each may lead to a different rectangular dual. 

2. Directed edge labels: Any two nodes in the face graph have at least one 
direction of adjacency which follows naturally from their geographic location (for 
example. The Netherlands lie clearly West of Germany). While in theory there 
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are four different directions of adjacency any two nodes can have, in practice only 
one or two directions are reasonable (for example, France can be considered to 
lie North or East of Spain). 

Our algorithms go through all possible combinations of direction assignments 
and determine which one gives a correct or the best result. While in theory 
there can be an exponential number of options, in practice there is often only 
one natural choice for the direction of adjacency between two regions. We call a 
particular choice of adjacency directions a directed edge labeling. 

Observation 1. A face graph F with a directed edge labeling can be represented 
by a rectangular dual if and only if 

1. every internal region has at least one North, one South, one East, and one 
West neighbor. 

2. when traversing the neighbors of any node in clockwise order starting at the 
western most North neighbor we first encounter all North neighbors, then 
all East neighbors, then all South neighbors and finally all West neighbors. 

A realizable directed edge labeling constitutes a regular edge labeling for F as 
defined in [6] which immediately implies our observation. 

3. Rectangular layout: To ac- 

tually represent a face graph to- 
gether with a realizable directed 
edge labeling as a rectangular dual 
we have to pay special attention 
to the nodes on the outer face 
since they may miss neighbors in 
up to three directions. To compen- 
sate for that we add four special 
regions NORTH, EAST, SOUTH, 
and WEST, as well as sea regions 
that help to preserve the original ^ rectangular lay- 
outline of the subdivision. Then we yg 

can employ the algorithm by He 

and Kant [6] to construct a rectangular layout, i.e., the unique rectangular dual 
of a realizable directed edge labeling. The output of our implementation of the 
algorithm by He and Kant is shown in Figure 3. 

4. Area assigumeut: For a given set of area values and a given rectangular 
layout we would like to decide if an assignment of the area values to the regions 
is possible without destroying the correct adjacencies. Should the answer be 
negative or should the question be undecidable, then we still want to compute 
a cartogram that has a small cartographic error while maintaining reasonable 
aspect ratios and relative positions. 

In this paper we introduce and discuss three methods for computing car- 
tograms from rectangular layouts. We implemented (parts of) our algorithms 
and we present experimental results in Section 6. In the remainder of this sec- 
tion we give a short overview of our algorithms. 
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Segment moving heuristic. A simple but efficient heuristic: We iteratively 
move horizontal and vertical segments of the rectangular layout to reduce the 
maximum relative error. Additional details can be found in Sections 4 and 6. 
Bilinear programming. The computation of a cartogram with the correct ad- 
jacencies and minimum relative error can be formulated as a bilinear program- 
ming problem, which unfortunately is nonconvex. Nevertheless, some non-linear 
programming methods may still be able to solve certain instances of the problem 
because the number of variables and constraints is only linear in the number of 
rectangles. Additional details can be found in Section 5. 

L-sequence algorithm. For certain types of rectangular layouts we can com- 
pute an exact or nearly exact cartogram. For arbitrary layouts it is currently 
unknown if one can decide in polynomial time if an exact cartogram exists. We 
first determine for a given rectangular layout a maximal rectangle hierarchy. 
The maximal rectangle hierarchy groups rectangles that together form a larger 
rectangle, as illustrated in Figure 4. It can be computed in linear time [13]. All 
groups in the hierarchy are independent and we will determine areas separately 
for each group. 




Fig. 4. Rectangular layout of the 12 provinces of the Netherlands and a corresponding 
maximal rectangle hierarchy. 

A node of degree 2 in the hierarchy corresponds to a sliceable group of rect- 
angles. If the maximal rectangle hierarchy consists of slicing cuts only, then we 
can (in a top-down manner) compute the unique position of each slicing cut. 
Here we may introduce wrong adjacencies, but this will only happen if there is 
no correct rectangular cartogram. 

Nodes of a degree higher than 2 (which 
is necessarily at least 5) require more com- 
plex cuts (see for example the four thick 
segments in Fig. 4). One of the main con- 
tributions of this paper is the characteri- 
zation of a type of non-sliceable layout for 
which the coordinates are still uniquely 
determined by the specified areas. We de- 
scribe an efficient algorithm to compute a 
cartogram for L-shape destructible layouts (see Section 3 for a precise definition) . 

L-shape destructible layouts are a natural generalization of sliceable lay- 
outs with a clear practical value. For example, the rectangular layout of the 48 




Fig. 5. Smallest non-sliceable layout 
(left), smallest non-L-shape destruc- 
tible layout (right). 
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contiguous states of the US depicted in Figure 3 is not sliceable, but L-shape 
destructible. Figure 5 shows the smallest non-sliceable layout, as well as the 
smallest non-L-shape destructible layout. 

In Section 3 we show that it is difficult to solve the computations for L-shape 
destructible layouts analytically. But it is possible to produce a cartogram based 
on one guessed value (coordinate) and decide whether the initial choice was too 
large or too small in linear time. 

3 L-Shape Destructible Layouts 

In this section we show that for certain non-sliceable rectangular layouts we 
can determine an exact or nearly exact rectangular cartogram efficiently (if one 
exists). Recall that a rectangular layout is a partition of a rectangle R into a 
set TZ of non-overlapping rectangles. We call a rectangular layout TZ irreducible 
if no proper subset of TZ (of size > 1) forms a rectangle. Furthermore, we call 
a rectilinear simple polygon with at most 6 vertices L-shaped. We say that an 
L-shaped polygon is rooted at a vertex p if one of its convex vertices is p. 

Definition 1 (L-shape destructible). An irreducible rectangular layout TZ 
of a rectangle R is L-shape destructible if there is a sequence starting at a cor- 
ner s of R in which the rectangles of TZ can be removed from R such that the 
remainder forms an L-shaped polygon rooted at the corner t of R opposite to s 
after each removal. 

Note that the removal sequence is not necessarily unique. However, an in- 
cremental algorithm cannot make a bad decision, so the sequence can easily be 
computed in linear time. See Figure 6 for an example of a removal sequence 
for an irreducible layout obtained from the maximal rectangle hierarchy of the 
rectangular layout of the Eastern United States depicted in Figure 3. 

The easiest type of non-sliceable but L-shape destructible layouts are wind- 
miZZ layouts (see Fig. 7 (left)). For a given set of area values {Hi, . . . , A^} we can 
compute the (unique) realization analytically because this only requires find- 
ing roots of a polynomial of degree 2. We can scale the axes of any instance 
such that the outer rectangle has coordinates (0,0), (0,1), (H,0), and (H, 1), 
where A = Ai -\- A2 -\- A3 -\- A^ -\- A^. Then we get the equations • yi = Hi, 
X2 ■ (1 - yi) = H2, {x2 - si) • (yi - j/2) = A3, {x3 - xi) ■ y2 = H4, and 
{x3 — X2) ■ (1 — 2/2) = H5, where also X3 = H. Solving these equations with 
respect to, say, yi, we obtain a quadratic equation in yi. A windmill layout al- 
ways has a realization as a cartogram, independent of the area values, which 
follows from the proof of Theorem 2. 

Already for only slightly more complex types, for example Figure 7 (right), we 
cannot determine the coordinates exactly anymore, because this requires finding 
the roots of a polynomial of degree 6. Although the degree of the polynomials 
suggests that there may be more than one solution, we show that every L-shape 
destructible layout has at most one unique realization as a cartogram for a given 
set of area values. Furthermore we can compute the coordinates of this realization 
with an easy, iterative method. 
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Fig. 6. An L-shape destructible layout of the 
Eastern US. The shaded area shows the L-shaped 
polygon after the removal of rectangles 1 — 4. 
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Fig. 7. A windmill layout (left) 
and a slightly more complex L- 
shape destructible layout (right). 
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Theorem 2. An L-shape destructible layout with a given set of area values has 
exactly one or no realization as a cartogram. 

Proof, (sketch) W.l.o.g. we assume that the L-shape destruction sequence begins 
with a rectangle i?i that contains the upper left corner of R. We denote the 
height of i?i with yi (see Fig. 8) and the L-shaped polygon that remains after 
the removal of i?i by Li. ^ ^ 

If we consider the edge lengths of Li to be a 
function of y\ then we can observe that these func- 
tions are either monotonically increasing (denoted 
by a plus sign near the edge) or monotonically de- 
creasing (denoted by a minus sign near the edge) 
in yi . (Note here that the length of the right and 
bottom edges in Fig. 8 is actually fixed. We simply 
declare them to be monotonically increasing and 
decreasing respectively.) In fact, we will prove a 
much stronger statement: The lengths of the edges of the L-shaped polygons 
remaining after each removal of a rectangle in a L-shape reduction sequence are 
monotone functions in yi. 

Clearly this statement is true after the removal of Ri. Li is of the L-shape 
type depicted in the upper left corner of Figure 9. We will now argue by means 
of a complete case analysis that during a destruction sequence only L-shaped 
polygons of the three types depicted in the left column of Figure 9 can ever occur. 
Note that the type of a L-shaped polygon also describes which of its edges are 
monotonically increasing or decreasing in y\. 

For each type of L-shaped rectangle there is only a constant number of possi- 
ble successors in the destruction sequence, all of which are depicted in Figure 9. 
We invite the reader to carefully study this figure to convince him or herself 
that it depicts indeed a complete coverage of all possibilities. Note that we never 



Fig. 8. Setting the height of 
the hrst rectangle in an L- 
shape destruction sequence. 
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Fig. 9. Removing a rectangle from a rooted L-shaped polygon. The plus and minus 
signs next to an edge indicate that its length is a monotonically increasing or mono- 
tonically decreasing function of yi. 



need to know exactly which functions describe the edge lengths of the L-shaped 
polygons - it is sufficient to know that they are always monotone in y\. 

During the destruction sequence it is possible that we realize that our initial 
choice of yi was incorrect. For example, a rectangle Ri should be placed according 
to the layout as in the second case of the top row of Figure 9, but the area of Ri 
is too small and it would be placed as in the first case of the top row. Then the 
signs of the edges show that yi must be increased to obtain correct adjacencies. 
Should no further increase of yi be possible, then we know that the correct 
adjacencies can not be realized for this particular layout and area values. 

As soon as the L-shaped polygon consists of only two rectangles we can make 
a decision concerning our initial choice of yi . The edge lengths of one of the two 
rectangles are either both monotonically increasing or monotonically decreasing. 
A comparison of its current area with its prescribed area tells us if we have to 
increment or decrement yi. 

A binary search on yi either leads to a unique solution or we recognize that 
the rectangular layout can not be realized as a cartogram with correct areas. □ 

4 Segment Moving Heuristic 

A very simple but effective heuristic to obtain a rectangular cartogram from a 
rectangular layout is the following. Consider the maximal vertical segments and 
maximal horizontal segments in the layout, for example the vertical segment 
in Figure 3 that has Kentucky (KY) to its left and West Virginia (WV) and 
Virginia (VA) to its right. This segment can be moved a little to the left, making 
Kentucky smaller and the Virginias larger, or it can be moved to the right with 
the opposite effect. 

The segment moving heuristic loops over all maximal segments and moves 
each with a small step in the direction that decreases the maximum error of the 
adjacent regions. After a number of iterations, one can expect that all maximal 
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segments have moved to a locally optimal position. However, we have no proof 
that the method goes to the global optimum, or that it even converges. 

The segment moving heuristic has some important advantages: (i) it can be 
used for any rectangular layout, {ii) one iterative step for all maximal segments 
takes 0{n) time, (Hi) no area need to be specified for sea rectangles, (iv) a bound 
on the aspect ratio can be specified, and (v) adjacencies between the rectangles 
can be preserved, but need not be. Not preserving adjacencies can help to reduce 
cartographic error. 

5 Bilinear Programming 

Once a rectangular layout is fixed, the rectangular cartogram problem can be for- 
mulated as an optimization problem. The variables are the x-coordinates of the 
vertical segments and the ^-coordinates of the horizontal segments. For proper 
rectangular layouts (no four-rectangle junctions, inside an outer rectangle) with 
n rectangles, there are n — 1 variables. The area of each rectangle is determined 
by four of the variables. 

We can formulate the minimum error cartogram problem as a bilinear pro- 
gram: the rectangle constraints for a rectangle R are: 

{xj - X,) ■ {yi - yk) >{l-e)-An, 

{xj - Xi) ■ {yi - yk) < (1 + e) • 

where Xi and Xj are the x-coordinates of the left and right segment, yk and yi 
are the y-coordinates of the bottom and top segment. An is the specified area of 
R, and e is the cartographic error for R. An additional 0{n) linear constraints 
are needed to preserve the correct adjacencies of the rectangles (e.g., along slice- 
able cuts). Also, bounded aspect ratio (height- width) of all rectangles can be 
guaranteed with linear constraints. The objective function is to minimize e. The 
formulation is a quadratic programming problem with quadratic constraints that 
include non-convex ones, and a linear optimization function. Since there are no 
xf or y'j terms, only Xiyj terms, it is a bilinear programming problem. Sev- 
eral approaches exist that can handle such problems, but they do not have any 
guarantees [1]. 

6 Implementation and Experiments 

We have implemented the segment moving heuristic and tested it on some data 
sets. The main objective was to discover whether rectangular cartograms with 
reasonably small cartographic error exist, given that they are rather restrictive 
in the possibilities to represent all rectangle areas correctly. Obviously, we can 
only answer this question if the segment moving heuristic actually finds a good 
cartogram if it exist. Secondary objectives of the experiments are to determine 
to what extent the cartographic error depends on maximum aspect ratio and 



On Rectangular Cartograms 733 



Table 1. Errors for different aspect ra- 
tios and sea percentages (correct adja- 
cencies) . 



Data set 


Sea 


Asp. 


Av. error 


Max. 


Eu elec. 


20% 


8 


0.071 


0.280 


Eu elec. 


20% 


9 


0.070 


0.183 


Eu elec. 


20% 


10 


0.067 


0.179 


Eu elec. 


20% 


11 


0.065 


0.155 


Eu elec. 


20% 


12 


0.054 


0.137 


Eu elec. 


10% 


10 


0.098 


0.320 


Eu elec. 


15% 


10 


0.076 


0.245 


Eu elec. 


20% 


10 


0.067 


0.179 


Eu elec. 


25% 


10 


0.049 


0.126 




Fig. 10. A cartogram depicting the electric- 
ity production of Europe. 



correct or false adjacencies. We were also interested in the dependency of the 
error on the outer shape, or, equivalently, the number of added sea rectangles. 

Our layout data sets consist of the 12 provinces of the Netherlands (NL), 
36 countries of Europe, and the 48 contiguous states of the USA. The results 
pertaining to the NL data set can be found in the full version of the paper. 
For Europe, we joined Belgium and Luxembourg, and Ukraine and Moldova, 
because rectangular duals do not exist if Luxembourg or Moldova are included 
as a separate country. Europe has 16 sea rectangles and the US data set has 9. 
For Europe we allowed 10 pairs of adjacent countries to be in different relative 
position, leading to 1024 possible layouts. Of these, 768 correspond to a realizable 
directed edge labeling. For the USA we have 13 pairs, 8192 possible layouts, 
and 4608 of these are realizable. In the experiments, all 768 or 4608 layouts are 
considered and the one giving the lowest average error is chosen as the cartogram. 

As numeric data we considered for Europe the population and the electric- 
ity production, taken from [2]. For the USA we considered population, native 
population, number of farms, number of electoral college votes, and total length 
of highways. The data is provided by the US census bureau in the Statistical 
Abstract of the United States.^ 

Preliminary tests on all data sets showed that the false adjacency option 
always gives considerably lower error than correct adjacencies. The false adja- 
cency option always allowed cartograms with average error of only a few percent . 
A small part of the errors is due to the discrete steps taken when moving the 
segments. Since cartograms are interpreted visually and show a global picture, 
errors of a few percent on the average are acceptable. Errors of a few percent 
are also present in standard, computer-generated contiguous cartograms [4,5,7, 
8]. We note that most hand-made rectangular cartograms also have false adja- 
cencies, and aspect ratios of more than 20 can be observed. 



^ http : / /www. census .gov/statab/www/ 




734 



M. van Kreveld and B. Speckmann 



Table 1 shows errors for various settings for the electricity production data 
set. The rectangular layout chosen for the table is the one with lowest average 
error, the corresponding maximum error is shown only for completeness. In the 
table we observe that the error goes down with a larger aspect ratio, as expected. 
For Europe and population, errors below 10% on the average with correct ad- 
jacencies were only obtained for aspect ratios greater than 15. The table also 
shows that a larger sea percentage brings the error down. This is as expected be- 
cause sea rectangles can grow or shrink to reduce the error of adjacent countries. 
More sea means more freedom to reduce error. However, sea rectangles may not 
become so small that they visually (nearly) disappear. 

Table 2 shows errors for various settings for two US data sets. Again, we 
choose the rectangular layout giving the lowest average error. In the US highway 
data set, aspect ratios above 7 do not seem to decrease the error below a certain 
value. In the US population data set, correct adjacencies give a larger error than 
is acceptable. Even an aspect ratio of 40 gave an average error of over 0.3. We 
ran the same tests for the native population data and again observed that the 
error decreases with larger aspect ratio. An aspect ratio of 7 combined with false 
adjacency gives a cartogram with average error below 0.04 (see Fig. 11). Only the 
highways data allowed correct adjacencies, small aspect ratio, and small error 
simultaneously. 

Figure 11 shows a rectangular cartogram of the US for one of the five data 
sets. Additional figures can be found in the full version of the paper. Three 
of them have false adjacencies, but we can observe that adjacencies are only 
slightly disturbed in all cases (which is the same as for hand-made rectangular 
cartograms). The data sets allowed aspect ratio of 10 or lower to yield an aver- 
age error between 0.03 and 0.06, except for the farms data. Here an aspect ratio 
of 20 gives an average error of just below 0.1. Figures 1 and 10 show rectan- 
gular cartograms for Europe. The former has false adjacencies and aspect ratio 
bounded by 12, the latter has correct adjacencies and aspect ratio bounded by 
8. The average error is roughly 0.06 in both cartograms. 



Table 2. Errors for different aspect ra- 
tios, and correct /false adjacencies. Sea 
20%. 



Data set 


Adj. 


Asp. 


Av. error 


Max. 


US pop. 


false 


8 


0.104 


0.278 


US pop. 


false 


10 


0.052 


0.295 


US pop. 


false 


12 


0.022 


0.056 


US pop. 


correct 


12 


0.327 


0.618 


US pop. 


correct 


14 


0.317 


0.612 


US pop. 


correct 


16 


0.308 


0.612 


US hw. 


correct 


6 


0.073 


0.188 


US hw. 


correct 


8 


0.058 


0.101 


US hw. 


correct 


10 


0.058 


0.101 




Fig. 11. A cartogram depicting the native 
population of the United States. 
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7 Conclusion 

In this paper we presented the first algorithms to compute rectangular car- 
tograms. We showed how to formalize region adjacencies in order to generate 
algorithmically processable layouts. We also characterized the class of L-shape 
destructible layouts and showed how to efficiently generate exact or nearly exact 
cartograms for them. An interesting open question in this context is whether 
non-sliceable, not L-shape destructible layouts have a unique solution as well. 
Another open problem is whether rectangular cartogram construction (correct 
or minimum error) can be done in polynomial time. 

We experimentally studied the quality of our segment moving heuristic and 
showed that it is very effective in producing aesthetic rectangular cartograms 
with only small cartographic error. Our tests show the dependency of the error 
on the aspect ratio, correct adjacencies, and sea percentage. Our implementation 
tests many different possible layouts, which helps considerably to achieve low 
error. We will extend our implementation and perform more tests in the near 
future to evaluate our methods further. 
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Abstract. Modern multiprocessor systems offer advanced synchroniza- 
tion primitives, built in hardware, to support the development of efficient 
parallel algorithms. In this paper we develop a simple and efficient algo- 
rithm for atomic registers (variables) of arbitrary length. The simplicity 
and better complexity of the algorithm is achieved via the utilization of 
two such common synchronization primitives. In this paper we also eval- 
uate the performance of our algorithm and the performance of a practical 
previously know algorithm that is based only on read and write prim- 
itives. The evaluation is performed on 3 well-known, parallel architec- 
tures. This evaluation clearly shows that both algorithms are practical 
and that as the size of the register increases our algorithm performs 
better, accordingly to its complexity behavior. 



1 Introduction 

In multiprocessing systems cooperating processes may share data via shared data 
objects. In this paper we are interested in designing and evaluating the perfor- 
mance of shared data objects for cooperative tasks in multiprocessor systems. 
More specifically we are interested in designing a practical wait-free algorithm 
for implementing registers (or memory words) of arbitrary length that could be 
read and written atomically. (Typical modern multiprocessor systems support 
words of 64-bit size.) 

The most commonly required consistency guarantee for shared data objects 
is atomicity, also known as linear izability. An implementation of a shared object 
is atomic or linearizable if it guarantees that even when operations overlap in 
time, each of them appears to take effect at an atomic time instant which lies 
in its respective time duration, in a way that the effect of each operation is in 
agreement with the object’s sequential specification. The latter means that if 
we speak of e.g. read/write objects, the value returned by each read equals the 
value written by the most recent write according to the sequence of “shrunk” 
operations in the time axis. 

* This work was supported by computational resources provided by the Swedish Na- 
tional Supercomputer Centre (NSC). 
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The classical, well-known and simplest solution for maintaining consistency 
of shared data objects enforces mutual exclusion. Mutual exclusion protects the 
consistency of the shared data by allowing only one process at time to access it. 
Mutual exclusion causes large performance degradation especially in multipro- 
cessor systems [1] and suffers from potential priority inversion in which a high 
priority task can be blocked for an unbounded time by a lower priority task [2] . 

Non-blocking implementation of shared data objects is an alternative ap- 
proach for the problem of inter-task communication. Non-blocking mechanisms 
allow multiple tasks to access a shared object at the same time, but without 
enforcing mutual exclusion to accomplish this. They offer significant advantages 
over lock-based schemes because i) they do not suffer from priority inversion; 
ii) they avoid lock convoys; iii) they provide high fault tolerance (processor fail- 
ures will never corrupt shared data objects); and iv) they eliminate deadlock 
scenarios involving two or more tasks both waiting for locks held by the other. 

Non-blocking algorithms can be lock-free or wait-free. Lock- free implementa- 
tions guarantee that regardless of the contention and the interleaving of concur- 
rent operations, at least one operation will always make progress. However, there 
is a risk that the progress of other operations might cause one specific operation 
to take unbounded time to finish. In a wait-free algorithm, every operation is 
guaranteed to finish in a limited number of steps, regardless of the actions of 
the concurrent operations. Non-blocking algorithms have been shown to be of 
big practical importance [3,4], and recently NOBLE, which is a non-blocking 
inter-process communication library, has been introduced [5]. 

The problem of multi-word wait-free read/ write registers is one 
of the well-studied problems in the area of non-blocking synchro- 
nization [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]. The main goal of 
the algorithms in these results is to construct wait-free multiword 
read/write registers using single- word read/ write registers and not other 
synchronization primitives which may be provided by the hardware in a system. 
This has been very significant, providing fundamental results in the area of 
wait-free synchronization, especially when we consider the nowadays well-known 
and well-studied hierarchy of shared data objects and their synchronization 
power [22]. A lot of these solutions also involve elegant and symmetric ideas 
and have formed the basis for further results in the area of non-blocking 
synchronization. 

Our motivation for further studying this problem is as follows: As the afore- 
mentioned solutions were using only read/ write registers as components, they 
necessarily have each write operation on the multi-word register write the new 
value in several copies (roughly speaking, as many copies as we have readers in 
the system), which may be costly. However, modern architectures provide hard- 
ware synchronization primitives stronger than atomic read/ write registers, some 
of them even accessible at a constant cost-factor away from read/write accesses. 
We consider it a useful task to investigate how to use this power, to the benefit 
of designing economical solutions for the same problem, which can lead to struc- 
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tures that are more suitable in practice. To the best of our knowledge, none of 
the previous solutions have been implemented and evaluated on real systems. 

In this paper we present a simple, efficient wait-free algorithm for imple- 
menting multi-word n-reader/single writer registers of arbitrary word length. In 
the new algorithm each multi-word write operation only needs to write the new 
value in one copy, thus having significantly less overhead. To achieve this, the al- 
gorithm uses synchronization primitives called fetch-and-or and swap [1], which 
are available in several modern processor architectures, to synchronize n readers 
and a writer accessing the register concurrently. Since the new algorithm is wait- 
free, it provides high parallelism for the accesses to the multi-word register and 
thus significantly improves performance. We compare the new algorithm with 
the wait-free one in [23], which is also practical and simple, and two lock-based 
algorithms, one using a single spin-lock and one using a readers-writer spin-lock. 
We design benchmarks to test them on three different architectures: UMA Sun- 
Fire-880 with 6 processors, ccNUMA SGI Origin 2000 with 29 processors and 
ccNUMA SGI Origin 3800 with 128 processors. 

The rest of this paper is organized as follows. In Section 2 we describe the 
formal requirements of the problem and the related algorithms that we are using 
in the evaluation study. Section 3 presents our protocol. In Section 4, we give 
the proof of correctness of the new protocol. Section 5 is devoted to the perfor- 
mance evaluation. The paper concludes with Section 6, with a discussion on the 
contributed results and further research issues. 



2 Background 

System and Problem Model. A shared register of arbitrary length [23,24] is 
an abstract data structure that is shared by a number of concurrent processes 
which perform read or write operations on the shared register. In this paper we 
make no assumption about the relative speed of the processes, i.e. the processes 
are asynchronous. One of the processes, the writer, executes write operations and 
all other processes, the readers, execute read operations on the shared register. 
Operations performed by the same process are assumed to execute sequentially. 

An implementation of a register consists of: (i) protoeols for executing the 
operations (read and write); (ii) a data structure consisting of shared subregis- 
ters and (iii) a set of initial values for these. The protocols for the operations 
consist of a sequence of operations on the subregisters, called suboperations. 
These suboperations are reads, writes or other atomic primitives, such as feteh- 
and-or or swap, which are either available directly on modern multiprocessor 
systems or can be implemented from other available synchronization primitives 
[22]. Furthermore, matching the capabilities of modern multiprocessor systems, 
the subregisters are assumed to be atomic and to support multiple processes. 

A register implementation is wait-free [22] if it guarantees that any process 
will complete each operation in a finite number of steps (suboperations) regard- 
less of the execution speeds of the other processes. 
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For each operation O there exists a time interval [so, fo] called its duration, 
where so and fo are the starting and ending times, respectively. We assume 
that there is an precedence relation on the operations that form a strict partial 
order (denoted For two operations a and b, a ^ b means that operation a 

ended before operation b started. If two operations are incomparable under — 
they are said to overlap. 

A reading funetion tt for a register is a function that assigns a high-level write 
operation w to each high-level read operation r such that the value returned by 
r is the value that was written by w (i.e. 7r(r) is the write operation that wrote 
the value that the read operation r read and returned). 

Criterion 1. A shared register is atomic iff the following three conditions hold 
for all possible executions: 

1. No- irrelevant. There exists no read r such that r — >■ 7r(r). 

2. No-past. There exists no read r and write w such that 7r(r) — >■ w — >■ r. 

3. No N-O inversion. There exist no reads ri and V 2 such that ri — >■ T 2 and 

7r(r2) 7r(ri). 

Peterson’s Shared Multi-Word Register. In [23] Peterson describes an im- 
plementation of an atomic shared multi-word register for one writer and many 
readers. The protocol does not use any other atomic suboperations than reads 
and writes and is described below. 

The idea is to use n + 2 shared buffers, which each can hold a value of the 
register, together with a set of shared handshake variables to make sure that 
the writer does not overwrite a buffer that is being read by some reader and 
that each reader chooses a stable but up-to-date buffer to read from. The shared 
variables are shown in Fig. 1 and the protocols for the read and write operations 
are shown in Fig. 2. 

Peterson’s implementation is simple and efficient in most cases, however, a 
high-level write operation potentially has to write n -I- 2 copies of the new value 
and all high-level reads read at least two copies of the value, which can be quite 
expensive when the register is large. Our new register implementation uses these 
additional suboperations to implement high-level read and write operations that 
only need to read or write one copy of the register value. 

We have decided to compare our method with this algorithm because: (i) 
they are both designed for the 1-writer n-reader shared register problem; (ii) 
compared to other more general solutions based on weaker subregisters (which 
are much weaker than what common multiprocessor machines provide) this one 
involves the least communication overhead among the processes, without requir- 
ing unbounded timestamps or methods to bound the unbounded version . 
Mutual-Exclusion Based Solutions. For comparison we also evaluate the 
performance of two mutual-exclusion-based register implementations, one that 
uses a single spin-lock with exponential back-off and another that uses a readers- 
writers spin-lock [1] with exponential back-off to protect the shared register. The 
readers-writers spin-lock is similar to the spin-lock but allows readers to access 
the register concurrently with other readers. 
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Variable 


Type 


Description 


WFLAG 

SWITCH 

READING 

WRITING 

BUFl 

BUF2 

CQPYBUF 


Boolean 

Boolean 

Array of n Boolean 
Array of n Boolean 
Buffer 
Buffer 

■Array of n butters 


Indicates that the writer is writhig in BUFl. 

To check if the writer has written in BUFl. 

Used together with WRITING to handle concurrent reads. 
LTsed together whth READING to handle concurrent reads. 
Main buffer. 

Backup buffer. 

An individual buffer copy for each reader. 



Fig. 1. The shared variables used by Peterson’s algorithm. The number of readers is 
n. BUFl holds the initial register value. All other variables are initialized to 0 or false. 



Read operation by reader r. 

READING [r] = ! WRITING [r]; 

flagl = WFLAG; 

swl = SWITCH; 

read BUFl; 

flag2 = WFLAG; 

sw2 = SWITCH; 

read BUFl; 

badl = Cswl != sw2) I I flagl I I flag2; 
if (READING [r] == WRITING [r]) { 
return the value in CDPYBUF[r]; 

} else if (badl) { 

return the value read from BUF2; 

} else { 

return the value read from BUFl; 

} 



Write operation. 

WFLAG = true ; 
write to BUFl; 

SWITCH = 'switch; 

WFLAG = false; 
for (each reader r) { 

if (READING [r] != WRITING [r]) { 
write to CQPYBUF [r] ; 

WRITING [r] = READING [r]; 

} 

} 

write to BUF2; 



Fig. 2. Peterson’s algorithm. Lower-case variables are local variables. 



3 The New Algorithm 



The idea of the new algorithm is to remove the need for reading and writing 
several buffers during read and write operations by utilizing the atomic synchro- 
nization primitives available on modern multiprocessor systems. These primitives 
are used for the communication between the readers and the writer. The new 
algorithm uses n + 2 shared buffers that each can hold a value of the register. 
The number of buffers is the same as for Peterson’s algorithm which matches the 
lower bound on the required number of buffers. The number of buffers cannot be 
less for any wait-free implementation since each of the n readers may be reading 
from one buffer concurrently with a write, and the write should not overwrite 
the last written value (since one of the readers might start to read again before 
the new value is completely written). 

The shared variables used by the algorithm are presented in Fig. 3. The 
shared buffers are in the (n -|- 2)-element array BUF. The atomic variable SYNC 
is used to synchronize the readers and the writer. This variable consists of two 
fields: (i) the pointer field, which contains the index of the buffer in BUF that 
contains the most recent value written to the register and (ii) the reading-bit field, 
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Coii-stants 


Description 


PTRFIELDLEN 

PTRFIELD 


'I’he nutiibei of l)its used for t.he i)t)int.er field 
indexing the most recently update buffer. 

A bitmask containing I’s in the PTRFIELDLEN least significant bits. 



Variable 


Type 


Description 


SYNC 

SYNC & PTRFIELD 


Unsigned word 


Consists of two fields. 

Index of the buffer vrith the most 
recent register value. 


bi(. PTRFIELDLEN + r of SYNC 




'file reading bit for reader r. 


BUF 


Array of n-t-2 buffers 


The buffers for the register value. 



Fig. 3. The constants and shared variables used by the new algorithm. The number of 
readers is n. Initially BUF[0] holds the register value and SYNC points to this buffer 
while all reader-bits are 0. 



Read operation by reader r. 

R1 readerbit = 1 « (r + PTRFIELDLEN) 

R2 rsync = fetch_and_or(&SYNC, readerbit) 
R3 rptr = rsync & PTRFIELD 
R4 read (BUF [rptr] ) 



Write operation. 




W1 


choose newwptr such that newwptr != oldwptr 


and 




newwptr != trace [r. 


for all r; 


W2 


write (BUF [newwptr] ) ; 




W3 


wsync = swap(&SYNC, 0 I newwptr); /* Clears 


all reading bits */ 


W4 


oldwptr = wsync & PTRFIELD; 




W5 


for each reader r { 




W6 


if (wsync k (1 « (r + PTRFIELDLEN))) { 




W7 


trace [r] = oldwptr; 




W8 


} 




W9 


} 





Fig. 4. The read and write operations of the new algorithm. The trace-array and 
oldwptr are static, i.e. stay intact between write operations. They are all initialized to 
zero. 



which holds a handshake bit for each reader that is set when the corresponding 
reader has followed the value contained in the pointer field. 

A reader (Fig. 4) uses fetch-and-or to atomically read the value of SYNC and 
set its reading-bit. Then it reads the value from the buffer pointed to by the 
pointer field. 

The writer (Fig. 4) needs to keep track of the buffers that are available for 
use. To do this it stores the index of the buffer where it last saw each reader, in 
a n-element array trace, in persistent local memory. At the beginning of each 
write the writer selects a buffer index to write to. This buffer should be different 
from the last one it used and with no reader intending to use it. The writer writes 
the new value to that buffer and then uses the suboperation swap to atomically 
read SYNC and update it with the new buffer index and clear the reading-bits. 
The old value read from SYNC is then used to update the trace array for those 
readers whose reading-bit was set. 
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The maximum number of readers is limited by the size of the words that 
the two atomic primitives used can handle. If we are limited to 64-bit words we 
can support 58 readers as 6 bits are needed for the pointer field to be able to 
distinguish between 58-1-2 buffers. 

4 Analysis 

We first prove that the new algorithm satisfies the conditions in Lamport’s cri- 
terion [12] (cf. Criterion 1 in section 2), which guarantee atomicity. 

Lemma 1. The new algorithm satisfies condition “No-irrelevant” 

Proof. A read r reads the value that is written by the write 7r(r). Therefore r’s 
read{BU F\j]) operation (line R4 in Fig. 4) starts after 7r(r)’s write{BU F\j]) 
operation (line W2 in Fig. 4) starts. On the other hand, the starting time-point 
of the read{BU F[j\) operation is before the ending time-point of r and the 
starting time-point of the write{BU F[j\) operation is after the starting time- 
point of 7r(r), so the ending time-point of r must be after the starting time-point 
of 7r(r), or r A 7r(r). □ 

Lemma 2. The new algorithm satisfies condition “No-past” 

Proof. We prove the lemma by contradiction. Assume there are a read r and a 
write w such that 7r(r) w ^ r, which means i) write Tr(r) ends before write 
w starts and write w ends before read r starts and ii) r reads the value written 
by 7r(r). Because 7r(r) — >■ ru — >■ r, the value of SYNC that r reads (line R2 in 
Fig. 4) is written by a write w' using the swap primitive (line W3 in Fig. 4), 
where w' = w or w ^ w' ^ r, i.e. w' ^ 7r(r) because 7r(r) — >■ w. On the 
other hand, because r reads the buffer that is pointed by SYNC (lines R2-R4), 
r would read the buffer that has been written completely by w' yf 7r(r). That 
means r does not read the value written by 7r(r), a contradiction. □ 

Lemma 3. The new algorithm satisfies condition “No N-0 inversion” 

Proof. We prove the lemma by contradiction. Assume there are reads r\ and r 2 
such that r\ — >■ r 2 and 7r(r2) — t 7r(ri). Because i) 7r(ri) always keeps track of 
which buffers are used by readers in the array trace\\ (lines W4-W7 in Fig. 4) 
and ii) the buffer 7r(ri) chooses to write to is different from those recorded in 
trace\\ as well as the last buffer the writer has written, wptr (lines W1-W2), ri 
reads the value written by 7r(ri) only if r\ has read the correct value of SYNC 
(line R2 in Fig. 4) that has been written by 7r(ri) (line W3 in Fig. 4). 

On the other hand, because ri — ?► ^ 2 , the value of SYNC that r 2 reads must 
be written by a write Wk where Wk = 7r(ri) or 7r(ri) — >■ Wk, i.e. Wk 7r(r2) 
because 7r(r2) — >■ 7r(ri). Moreover, because T 2 reads the buffer that is pointed by 
SYNC (lines R2-R4), T 2 would read the buffer that has been written completely 
by Wk y^ 7r(r2) (lines W2-W3). That means T 2 reads the value that has not been 
written by 7r(r2), a contradiction. □ 
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Complexity. The complexity of a read operation is of order 0{m), where m is 
the size of the register, for both the new algorithm and Peterson’s algorithm [23]. 
However, in Peterson’s algorithm the reader may have to read the value up to 3 
times, while in our algorithm the reader will only read the value once and has to 
use the fetch-and-or suboperation. A write operation in the new algorithm writes 
one value of size m and then traces the n readers. The complexity of the write 
operation is therefore of order 0{n + m). For Peterson’s algorithm however, the 
writer must in the worst case write to n -I- 2 buffers of size m, thus its complexity 
is of order 0{n ■m). As the size of registers and the number of threads increase, 
the new algorithm is expected to perform significantly better than Peterson’s 
algorithm with respect to the writer. With respect to the readers the handshake 
mechanism used in the new algorithm can be more expensive compared to the 
one used by Peterson’s, but on the other hand the new algorithm only needs 
to read one m-word buffer, whereas Peterson’s need to read at least two and 
sometimes three buffers. 

Using the above, we have the following theorem: 

Theorem 1. A multi-reader, single-writer, m-word sized register can he con- 
structed using n -\-2 buffers of size m each. The complexity of a read operation 
is 0{m). The complexity of a write operation is 0{n -\- m). 

5 Performance Evaluation 

The performance of the proposed algorithm was tested against: i) Peterson’s 
algorithm [23], ii) a spinlock-based implementation with exponential backoff and 
iii) a readers- writers spinlock with an exponential backoff [1]. 

Method. We measured the number of successful read and write operations 
during a fixed period of time. The higher this number the better the performance. 
In each test one thread is the writer and the rest of the threads are the readers. 
Two sets of tests have been done: (i) one set with low contention and (ii) one 
set with high contention. During the high-contention tests each thread reads or 
writes continuously with no delay between successive accesses to the multi-word 
register. During the low-contention tests each thread waits for a time-interval 
between successive accesses to the multi-word register. This time interval is much 
longer than the time used by one write or read. Tests have been performed for 
different number of threads and for different sizes of the register. 

Systems. The performance of the new algorithm has been measured on both 
UMA (Uniform Memory Architecture) and NUMA (Non Uniform Memory Ar- 
chitecture) multiprocessor systems. The difference between UMA and NUMA is 
how the memory is accessed. In a UMA system all processors have the same la- 
tency and bandwidth to the memory. In a NUMA system, processors are placed 
in nodes and each node has some of the memory directly attached to it. The 
processors of one node have fast access the memory attached to that node, but 
accesses to memory on another node has to be made over a network and is 
therefore significantly slower. The three different systems used are: 
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The new algorithm Spinlock 

Petereon's register Readers/writers lock 



The new algorithm Spinlock 

Peterson’s register Readere/writers lock 



(a) 16 threads, high contention. 



(b) 8192 words, high contention. 





The new algorithm Spinlock 

Peterson's register Readers/writers lock 



The new algorithm Spinlock 

Peterson’s register Readers/writers lock 



(c) 16 threads, low contention. 



(d) 8192 words, low contention. 



Fig. 5. Average number of reads or writes per process on the UMA SunFire 880. 



- An UMA Sun SunFire 880 with 6 900MHz UltraSPARC III+ (8MB L2 
cache) processors running Solaris 9. 

- A ccNUMA SGI Origin 2000 with 29 250MHz MIPS RIOOOO (4MB L2 cache) 
processors running IRIX 6.5. 

- A ccNUMA SGI Origin 3800 with 128 500MHz MIPS RI4000 (8MB L2 
cache) processors running IRIX 6.5. 

The systems were used non-exclusively, but for the SGI systems the batch- 
system guarantees that the required number of CPUs was available. The swap 
and fetch-and-or suboperations were implemented by the swap hardware in- 
struction [25] and a lock-free subroutine using the compare_and_swap hard- 
ware instruction on the SunFire machine. On the SGI Origin machines swap 

and fetch-and-or were implemented by the lock_test_and_set and the 

f etch_and_or synchronization primitives provided by the system [26], re- 
spectively. 
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Words 



The new algorithm Spinlock 

Petereon's register Readers/writers lock 



(a) All. 14 threads, high contention. 



•IQnonA 




Threads 

The new algorithm Spinlock 

Peterson’s register Readem/writers lock 



(b) All. 8192 words, high contention. 





Words 



Threads 



The new algorithm Spinlock 

Petereon's register Readers/writers lock 



The new algorithm Spinlock 

Peterson’s register Readem/writers lock 



(c) Writer. 14 threads, high con- 
tention. 



(d) Writer. 8192 words, high con- 
tention. 




Words 




The new algorithm Spinlock 

Petereon's register Readers/writers lock 



The new algorithm Spinlock 

Peterson’s register Readem/writers lock 



(e) All. 14 threads, low contention. 



(f) All. 8192 words, low contention. 



Fig. 6. Average number of reads or writes per process on NUMA Origin 2000 
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Words 



The new algorithm Spinlock 

Petereon's register Readers/writers lock 



(a) 16 threads, high contention. 




The new algorithm Spinlock 

Petereon's register Readers/writers lock 

(c) 16 threads, low contention. 



•|Qnr.nA 




Threads 



The new algorithm Spinlock 

Peterson’s register Readem/writers lock 



(b) 8192 words, high contention. 




The new algorithm Spinlock 

Peterson’s register Readem/writers lock 

(d) 8192 words, low contention. 



Fig. 7. Average number of reads or writes per process on NUMA Origin 3800. 



Results. Following the analysis and the diagrams presenting the experiments’ 
outcome, it is clear that the performance of the lock-based solutions is not even 
near the figures of the wait-free algorithms unless the number of threads is 
minimal (2) and the size of the register is small. Moreover, as expected following 
the analysis, the algorithm proposed in this paper performs at least as well and 
in the large-size register cases better than Peterson’s wait-free solution. 

More specifically, on the UMA SunFire the new algorithm outperforms the 
others for large registers under both low and high contention (cf. Fig. 5) 

On the NUMA Origin 2000 and the NUMA Origin 3800 platforms (cf. Fig. 6 
and 7, respectively), we observe the effect of the particular architecture, namely 
the possibility for creating contention on the synchronization variable may be- 
come high, affecting the performance of the solutions. By observing the per- 
formance diagrams for this case, we still see the writer in the new algorithm 
performs significantly better than the writer in Peterson’s algorithm in both the 
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low and high-contention scenarios. Recall that the writer following Peterson’s al- 
gorithm has to write to more buffers as the number of readers grow. The writer 
of the algorithm proposed here has no such problems. The latter phenomenon, 
though, has a positive side-effect in Peterson’s algorithm: namely, as the writer 
becomes slower, the chances that the readers have to read their individual buffers 
apart from the regular two buffers, become smaller. Hence, the readers’ perfor- 
mance difference of the two wait-free algorithms under high contention becomes 
smaller. 

6 Conclusions 

This paper presents a simple and efficient algorithm of atomic registers (mem- 
ory words) of arbitrary size. The simplicity and the good time complexity of the 
algorithm are achieved via the use of two common synchronization primitives. 
The paper also presents a performance evaluation of i) the new algorithm; ii) a 
previously known practical algorithm that based only on read and write opera- 
tions; and iii) two mutual-exclusion-based registers. The evaluation is performed 
on three different well-known multiprocessor systems. 

Since shared objects are very commonly used in parallel/multithreaded ap- 
plications, such results and further research along this line, on shared objects 
implementations, is significant towards providing better support for efficient syn- 
chronization and communication for these applications. 
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Abstract. Game tree search is the core of most attempts to make com- 
puters play games. Yet another approach is to store all possible positions 
in a database and to precompute the true values for all the positions. 
The databases allow computers to play optimally, in the sense that they 
will win every game once they reach a winning position. Moreover, they 
will never lose from a draw or win position. This is the situation that we 
find in games like Connect-4, in endgames of chess and in many other 
settings. Nevertheless, it has been observed that the programs do not 
play strongly when they have to play a tournament with strong, but 
non-perfect human players attending. In this paper we propose a mix- 
ture of game tree search and the database approach which attacks this 
problem on the basis of a known theoretical error analysis in game trees. 
Experiments show encouraging results. 



1 Introduction 

Game playing computer programs become stronger and stronger. One reason 
is that the search procedures of forecasting programs become faster and more 
sophisticated. Simultaneously, database approaches have been further developed. 
Some 6-men chess endgames — that are chess endgames with no more than 6 
pieces on board — have been solved, and several easier games like Connect-4 
as well. This means that for every position of the given game, the true value 
is known. Astonishingly, it is just this kind of perfection that leaves the mark 
of stupidity. When such a ’perfect’ program reaches a winning position, it will 
always win, and when it reaches a draw position, it will never lose. Nevertheless, 
when it plays against fallible opponents, it is often not able to bring the opponent 
into trouble and to reach a better position than it had at the beginning. The 
programs cannot distinguish between ’easy’ and ’difficult’ positions for their 
opponents. This problem has rarely been tackled [14,7], and for the best of our 
knowledge never on the basis of a theoretical error analysis. 

1.1 Game Tree Search Is an Error Filter 

In this paper, G = (T, h) is a game tree, if T = (V, E) is a tree and h is a function, 
mapping nodes from V into {—1,0, 1}. We identify the nodes of a game tree G 
with positions of the underlying game, and the edges of T with moves from one 
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Fig. 1. Only the (pre-)chosen partial tree is examined by a search algorithm. 



position to the next. There are two players: MIN and MAX, and V can be split 
into two disjoint subsets of nodes. MAX makes moves at MAX-nodes, and MIN 
makes moves at MIN-nodes. In figures, MAX-nodes are symbolized by angular 
boxes. MIN-nodes are symbolized by circles. An algorithm which follows the 
minimax principle builds values of inner tree nodes with the help of minmax 
values. This means that when u is a MAX-node, the value of v is the maximum 
over the values of v' s successors. When i; is a MIN-node, the value of v is the 
minimum over the values of v' s successors. 

For most of the interesting board games, we do not know the correct values of 
all positions. Therefore, we are forced to base our decisions on heuristic or vague 
knowledge. Typically, a game playing program consists of three parts: a move 
generator, which computes all possible moves in a given position; an evaluation 
procedure which implements a human expert’s knowledge about the value of a 
given position (these values are quite heuristic, fuzzy and limited) and a search 
algorithm, which organizes a forecast. 

At some level of branching, the complete game tree (as defined by the rules of 
the game) is cut as shown in Figure 1, the artificial leaves of the resulting subtree 
are evaluated with the heuristic evaluations, and these values are propagated to 
the root of the game tree, as if they were real ones. For 2-person zero-sum 
games, computing this heuristic minimax value is by far the most successful 
approach in computer games history. When Shannon [18] proposed a design for 
a chess program in 1949 — which is in its core still used by all modern game 
playing programs — it seemed quite reasonable that deeper searches lead to 
better results. The astonishing observation over the last 40 years in the chess 
game and some other games is: the game tree acts as an error filter. The faster 
and more sophisticated the search algorithm, the better the search results! 

Usually the tree to be searched is examined with the help of the Alphabeta 
algorithm [9] or the MTD(f)-algorithm [1]. In professional game playing pro- 
grams, mostly the Negascout [15] variant of the Alphabeta algorithm is used. 
For theoretical examinations it is of minor importance which variant is used, be- 
cause as far as the error frequency is concerned, it does not make any difference 
whether a pre-selected partial game tree is examined by the a/J-algorithm or by 
a pure minimax algorithm. In both cases, the result is the same. Only the effort 
to obtain the result differs drastically. 
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The approximation of the real root value with the help of fixed-depth searches 
leads to good results. Nevertheless, several enhancements that form the search 
tree more individually have been found. Some of these techniques are domain 
independent like Singular Extensions [4], Nullmoves [6], Conspiracy Number 
Searches [12] [16], and Controlled Conspiracy Number Search [10]. Many others 
are domain dependent. It is quite obvious that the form of the search tree strongly 
determines the quality of the search result. 

1.2 Modeling the Mystery 

In the eighties however, the first theoretical analyses suggested that search- 
ing deeper might lead to worse results. There have been papers by Pearl [13], 
G. Schriifer [17], Althofer [2,3], Kaindl and Scheucher [8] dealing with this error 
behavior. Lorenz and Monien [11] presented a one-to-one relation between the 
number of leaf-disjoint strategies and the robustness of game trees for minimax 
game trees with values of 0 and 1. 

The newer papers have one idea in common: The authors assume that there 
is a reality, in that all positions of the game have an unequivocally true value. 
The art of a playing program, which can only deal with heuristic and vague value 
estimations, is to find out as much as possible of these true values. It is assumed 
that the heuristic evaluation procedure of the game playing programs is quite 
good in the sense that mostly the heuristic values are nearly correct, or that at 
least the relations between various values are more or less correct. The question 
is why and under which conditions the game tree helps or does not help the 
programs at their difficult task. One assumption might be that the ratio of inner 
tree nodes to game leaf nodes may vary, such that our definite knowledge about 
an end of a game (e.g. mate or stale mate) may help to improve the quality of the 
forecast. Another assumption might be that the quality of the programs’ heuristic 
evaluation procedure increases with increasing depths. When we take a look at 
e.g. the chess game however, both assumptions seem unrealistic. When we are at 
move ten, it is not to see why the quality of our evaluation procedure should be 
better 20 plies away from the current position than only 16 plies away. And an 
examination of the search trees of the programs shows that it is neither correct 
that the number of mates and stale-mates increases with increasing depth. A 
third assumption is that it has to do with the structure of the game tree. The 
distribution of the possible true values over the nodes of the game tree might be 
a key. Indeed, this is exactly the assumption which we follow in this paper. 

1.3 Modern Models and Results 

The main result in [11] is the following. Three different measures for robustness 
are examined. 

A) An arbitrary but fixed finite game tree G is given. Each of its leaves has a 
‘true’ value of 0 or 1. These values are propagated to inner nodes following the 
minimax principle, yielding true values for all nodes of the tree, in particular 
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Fig. 2. Possible courses of Q{p) 

the root. A second value is assigned to each leaf, called the heuristic value. 
With probability p, the heuristic and the true value of a leaf are equal. These 
values are propagated to the inner nodes according to the minimax principle 
as well. Denote by Qg{p) the probability that the true and the heuristic 
value of the root of G are equal. If we denote by Qq\-) the n-th derivative of 
Qg{')i the robustness is measured in the number of the first n derivations of 
Qg{p) which are all zero at p. The motivation is as follows. Figure 2 presents 
us with three different courses of Q{p) and the identity function. The graphs 
of Q{p) can principally elapse in the three different ways, as shown in the 
figure. In the case of Q{p) < p, we do not see any motivation for performing 
any search at all. It is obviously more effective to directly evaluate the root. 
Thus, we see that the search tree is only useful (in an intuitive sense, here) 
when Q{p) > p, since only then is the error rate of the computed minimax 
value of the root smaller than the error rate of a direct evaluation of the 
root. Thus, if Qg{p) and Qh{p) are polynomials of quality and Q(j(l) = 0 
and Q'u{G) > 1, there will be an e > 0 such that for all p € [1 — e, 1] it is 
Qg{p) > Qh{p), and thus G is more useful than H. The game trees, which 
are examined by iterative deepening search processes, form a sequence of 
supertrees resp. subtrees. It is now quite intuitive to demand that the error 
probability of supertrees should be above subtrees, at least starting with 
p € [1 — e, 1], in order to formalize what it means when we say that deeper 
search processes lead to better results. A point of criticism might be the fact 
that error- probabilities are assumed to be independent from each other and 
can be described by a constant p. This model however, can be seen as a 
special case of average case analysis. 

Let us call an average-case error analysis, in which the number of occurring 
errors is weighted according to a binomial distribution, a relaxed average- 
case analysis. The term expresses that we are interested in the problem of 
how errors propagate in minimax trees when we presume an error rate of 
’approximately x percent’. 

Let G be an arbitrary game tree with n leaves, and let s be the string of 
Os and Is, consisting of the real values of the leaves of G in a left to right 
order. Furthermore, let s' € {0, 1}" be a further string of length n. If the 
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Fig. 3. A MAX- and a MIN-strategy 



value s{i) and s{i') have the same value at a position i, we say ’the i-th leaf 
of s' has been correctly classified’. When we put all strings of length n into 
clusters C\ . . . with each cluster containing those strings which have i 
correctly classified items, there exists a certain number Ci for each cluster of 
strings, which tells us in how many cases the root is correctly classified by a 
minimax-evaluation of G. 

Since Qg{p) is indeed equal to J2^=o Pi'ob(the root value is correctly classi- 
fied by the minimax-evaluation of G \ there are exactly i correctly classified 

leaves) • Prob( exactly i leaves are correctly classified) = G/|Ci| • 

X* • (1 — x)”“®, with X being equal to the probability p of our combinatorial 
model, the proposed model is nothing but a nice mathematical vehicle in 
which we perform a relaxed average-case error analysis. This means that it 
is difficult to place errors in a way that the given analysis does fail reality. 
Average case analyses are a common tool of analysis in computer science. 
Nevertheless, the question of whether or not a given average case analysis is 
well suited in order to analyze a given phenomenon, can only be proven by 
experiments. 

B) The robustness is the number of leaves whose value can be arbitrarily 
changed without the root value being effected. Thus, the robustness is the 
conspiracy number (minus 1) of the game tree and the root value, seen from 
the side of the true values. 

C) A strategy is a subtree of a given game tree with the following properties, 
a) The root is part of the strategy, b) For one of the players, exactly one 
successor of each node in the strategy is in the strategy as well. For the other 
player, all successors of the nodes in the strategy are always in it. A strategy 
looks as described in Figure 3. On the right, we see a so called MIN-strategy, 
which gives us, resp. proves an upper bound on the value of the root. On 
the left, the MAX-strategy proves a lower bound on the root value. Thus, 
a strategy is quite a tiny sub-structure of a game tree which gives us 
a lower or an upper bound on the root value. As we allowed only the values 
0 and 1, a MIN-strategy with value 0 shows that the root value is 0, and a 
MAX-strategy with value 1 shows that the root value is 1. The robustness 
is measured in the number of leaf-disjoint strategies which give us either a 
lower bound 1 or an upper bound 0 on the root value. 
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Theorem 1. The given three robustness notions are equivalent to each other. 
Qg “ ■ ■ ■ “ = 0 */ only if there are k leaf-disjoint strategies 

that prove the true value of the root, if and only if the values of at least k — 1 
arbitrarily chosen leaves can be arbitrarily changed without the true value of the 
root changing. 

In particular, k leaf-disjoint strategies ensure that <5 g( 1~£) ^ 1 — for e small, 
c a constant. Thus the number of leaf-disjoint strategies is a good indication for 
the robustness of a game tree. This was the first result for concrete (as opposed 
to randomly chosen, vague) game trees. 

Doerr and Lorenz [5] extended the result to arbitrary leaf values and addi- 
tionally showed that even for minimax -trees with arbitrary leaf values, it is valid 
that Qq ^^(1) = • • • = Qq\i) = 0 if and only one can arbitrarily change k — 1 
leaf values of G without the root value of G changing. Thus, k is the conspiracy 
number of G. 

In this paper, we are going to present an algorithm which gets the perfect 
knowledge of a game database as input, and whose task it is to bring an op- 
ponent without perfect knowledge into trouble. The algorithm is based on the 
theoretical error analysis in game trees, as just presented. In the next section, 
we describe the algorithm, and in section 3 we present experimental data from 
its implementation. 



2 Exploring the Local Game Tree Structure 

Our aim is to find an algorithm which makes perfect moves only, and at the 
same time maximizes its opponent’s chances to fail. As the aim is to make profit 
from the fact that some opponents may have no perfect knowledge about the 
game, we call it a cheating algorithm. Let u be a treenode with a value x. Let 
T be the smallest possible subtree of the allover game tree rooted at v which 
contains c leaf disjoint strategies which prove the value x. Let A be a search 
algorithm which tries to evaluate v with the help of search and with a heuristic 
evaluation function h : V ^ Z. Let h be nearly perfect, making faults randomly 
distributed. Concerning the previous analysis, either A examines all nodes of T, 
or A wastes time because A' s probability to evaluate the node v correctly is not 
significantly higher than that of a static evaluation h{v). Note that T can be 
arbitrarily large and arbitrarily irregular. Next, we are going to try to identify 
moves which lead into positions which are difficult to judge for every non-perfect 
opponent who tries to gain profit from a forecasting search. 

Let us inspect an example presented in Figure 4. We assume that the true 
values of the game are —1, 0, 1. In level 0, at node v, we want to find a perfect 
move which is difficult to estimate for an arbitrary opponent. On level 1 our 
opponent has to move, and he will try to prove some move as the best one. Let 
us inspect the left branch of the root. The value of the node Vi is 0 and it has 
two successors with values 0, one with value 1. Our opponent is in danger to 
make a mistake when he either underestimates the two nodes Vi and v^, or when 
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V 




less or equal to 0 



Fig. 4. The upper part of a game tree with a MAX-node at the root 



he overestimates vq. In order to make no mistake he must see that V 4 or 115 is a 
good successor, and he must see that all successors with value —1, (i.e. Vq) are 
bad for him. Let Z4 be the number of nodes of the smallest tree below V4 that 
contains two leaf disjoint strategies, both showing that the value of V4 is less 
than or equal to 0. Let Z 5 be analogous for v^. Let zq be the number of nodes of 
the smallest tree below vq that contains two leaf disjoint strategies, both showing 
that the value of Vq is greater than or equal to 1. We measure the difficulty of 
vi as min^^_25 + zq. 

More generally: For simplification of the description, in the following let v be 
a MIN-successor of a MAX-root and v\ . . .Vb v's successors. Let the value of the 
root be < 0, because in the case of root values > 0 we play perfectly and win. 
In the following, a cn-C tree rooted at node v and showing that f{v) > 0 
(resp. />,<,< 0), is a tree T which will prove the desired bound on the value 
of V, even if we arbitrarily change up to (7 — 1 leaf values in T. We determine 
how difficult it is to correctly estimate the value of node v as follows. 

function difficulty(MIN-node v, robustness C) 

// let u be a MIN-node with value —1 or 0 

// let / be the function that maps nodes to its true values f{v). 

// let vi, ... ,Vb he the successors of v. 
if (f(v) == 0) { 

cntSUM = X]/(„-)>o minimum cn-C tree rooted at Vi and 

showing that f(vi) > 0 
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cntMIN — minimum size of minimum cn-C tree rooted at Vi and 
showing that f{vi) < 0) 

} else if (/(v) < 0) { 

cntSUM — X]/(t, )>o minimum cn-C tree rooted at Vi and 

showing that f{vi) > 0 

cntMIN = minimum size of minimum cn-C tree rooted at Vi and 
showing that f{vi) < 0) 

} 

return cntM IN + cntSU M ; 

Finding a cn-C tree is quite easy, and efficiently possible because it is based 
on strategies. We assign a so called target ty = (c G IN,b € f (v)} 

to the node v. If v is an ALL- node (i.e. v is MAX-node and b , or v is 
a MIN-node and b ='>') all successors of v must be examined and is given 
to all these successors. Otherwise, if v is CUT-node, ty can be split to those 
successors v\...Vd which are 'b'f{v). (’b’ is a place holder for < or >.) Let 
ti ■= (ci, bi, f{v)}, 1 < i < dhe the targets for all these successors v\ . . . Vd- The 
split must be done such that equals c. If c = 1 there is nothing else to 

do. We assume that every end of the game can correctly be recognized, such that 
when the end of the game is reached by mate, repetition or the 50-moves-rule, 
the target is fulfilled. Unfortunately, we need not any, but the smallest subtree 
with the desired properties, and there are a lot of possibilities for how to split 
a target. Nevertheless, all these possibilities can be examined with the help of a 
breadth- first-search, so that the effort is limited. In fact, this information could 
be pre-computed and incorporated into the database. For cn-2 the desired tree 
is even unique. For our experiments we used cn-2 and cn-3. 



3 Experiment 

We played an all-against-all chess tournament with 8 chess programs, the results 
of which are shown in Figure 5. The programs are ’R’, ’CN2’, ’CN3’, ’HP’, ’Shred- 
der6’, ’Fritz?’, ’Gromit3.1’, and ’SOS March 2000’. The last four programs are 
commercially available programs. ’Shredder6’ and ’Fritz?’ played with a fixed 
depth of 8, ’Gromit3.1’ and ’SOS March 2000’ only with a fixed depth of 4. 
These four programs have no database information. ’GN2’ and ’GN3’ are pro- 
grams which follow the idea of the previous section. ’R’ is a program which just 
randomly chooses one of the perfect moves. Thus, it is also a cn-1 program. 
’HP’ chooses also only perfect moves, but it does a depth-4 so called prob-max 
search. On even levels, it builds the value of a node v by taking the maximum 
over v's son values. On odd levels, it builds the value of a node v by taking the 
average over v's son values. The idea is taken from [?], but we did neither allow 
sub-perfect moves at the root, nor any domain specific knowledge. 

We selected 20 endgame positions (Figure 6) as start-up positions and each 
program played 40 games (20 white and 20 black) against each other one. The 20 
start-up positions are taken from endgame databases, such that the real values 
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Fig. 5. Test Results 



are known for all these positions, as well as for all possible sequences of moves 
starting from these positions up to the end of the game. 

If you look at one line of Figure 5, you can see how a given program behaved 
against the others. For example, the CN3 program scored 23.5 point against 
Fritz, and it scored 159 points all together, as can be seen in the ^-column. 
The four last columns show how the programs scored, under exclusion of three 
of the four database driven programs. For example, the CN3^-column shows 
how many points the programs scored in a sub-tournament, in which only the 
programs CN3, Fritz, Shredder, Gromit, and SOS participated. 

The final ranking is strongly influenced by the selection of the start-up posi- 
tions. If these positions are too difficult, none of the commercial programs can 
make profit from its positional knowledge. If they are too simple, no program 
fails. For example, the gaps between the programs might be made arbitrarily 
small, by just adding further simple drawing positions. Moreover, all positions 
are played with black and white by each program equally, so that it was not pos- 
sible to score more than 50% against the database based programs. Nevertheless, 
we observe that ’CN2’ never behaved worse than ’R’ or ’HP’, and ’CN3’ never 
did worse than ’CN2’, not even concidering the games against one specific oppo- 
nent. Thus, we see clear progress by the conspiracy numbers driven programs, 
and we see hardly any progress with the prob-max idea. When we examine the 
sub-tournaments with ’Shreddei'6’, ’Fritz?’, ’Gromit3.1’, ’SOS March 2000’ and 
only one database-based program at a time, we see that only ’GN2’ and ’GN3’ 
win their tournaments, while ’HP’ and ’R’ only reach rank 3. For example, you 
get ’GN3^ when you delete all games of GN2, HP and R from the above table 
and sum up the rest. 

We might go even further and ask which program performs best in cheating 
’SOS March 2000’ (the least successful program concerning this little tourna- 
ment). Let us play only half of the games, i.e. ’SOS’ plays always white, with 
the exceptions of the positions 4,14, and 17 where it plays black. Then the rank- 
ing is Shredder (14 pts), Fritz (11.5 pts), GN3 and GN2 (10.5 pts each), Gromit 
(9 pts), HP (8.5 pts) and R (8 pts). You can, however, turn the numbers as you 
ever want, GN3 and GN2 are always much more successful than HP and R. 
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8/R7/2r5/5p2/6kl/8/5K2/8 b 
8/6Rl/8/5p2/6kl/2r5/5K2/8 b 
8/6rl/8/4k3/8/5B2/5K2/8 w 
8/6rl/8/4k3/8/4BR2/5K2/8 b 
7k/6pl/6ql/8/8/8/4QK2/8 w 
6Kl/3p4/3k4/8/N7/8/8/lN6 b 
8/8/R7/8/k7/3P4/6bl/K7 b 
8/7Q/7K/2r5/2p5/8/lk6/8 w 
8/p7/8/4K3/7k/5b2/8/lR6 w 
klK5/2n5/8/8/b7/lR6/8/8 w 
k7/p7/8/K7/R7/8/8/5n2 w 
K7/5b2/2R5/8/4k3/5R2/8/8 b 
8/8/8/klp5/2P5/lK6/P7/8 w 
8/8/4p3/4k3/8/3PKP2/8/8 w 
8/3Klknl/2Q2b2/8/8/8/8/8 w 
K7/P4k2/8/8/8/8/4R3/lr6 w 
5k2/8/3KPP2/8/7n/8/8/8 b 
K7/8/8/5ppl/2k2N2/8/8/8 w 
8/KPln4/2k5/8/5N2/8/8/8 w 
8/lK6/P7/kn6/4N3/8/8/8 w 



0 1 KRkrp 
0 8 KRkrp 
0 8 KBkr 
0 8 KRBkr 
0 8 KQkqp 
0 1 KNNkp 
0 1 KRPkb 
0 1 KQkrp 
0 1 KRkbp 
0 1 KRkbn 
0 1 KRknp 
0 1 KRRkb 
0 1 KPPkp 
0 1 KPPkp 
0 1 KQkbn 
0 1 KRPkr 
0 1 KPPkn 
0 1 KNkpp 
0 1 KNPkn 
0 1 KNPkn 



Fig. 6. The start positions in standard fen format 



Last but not least, we present some observations made during the games: 
a) The cn-based programs avoid move repetition draws, as those draws lead 
to small difficulties, b) They tend to keep the material on board, and they c) 
cover the idea of singular extensions. We found no simple way to characterize 
the differences between the CN2 and CN3 programs in the chess-language. 



4 Summary and Conclusion 

We presented an algorithm which has the perfect knowledge of a game database 
and which has been designed to bring the opponent into trouble. The algorithm 
is based on a theoretical error analysis in game trees. It has been implemented in 
a program which played a tournament against chess programs on a collection of 
start positions from endgame databases. No chess-dependent heuristic evaluation 
knowledge has been used. The program has shown that it makes clear progress 
with the help of inspecting the tree structure. 
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Abstract. We consider a scenario where stops are to be placed along 
an already existing public transportation network in order to improve 
its attractiveness for the customers. The core problem is a geometric set 
covering problem which is A/”P-hard in general. However, if the corre- 
sponding covering matrix has the consecutive ones property, it is solvable 
in polynomial time. In this paper, we present data reduction techniques 
for set covering and report on an experimental study considering real 
world data from railway systems as well as generated instances. The re- 
sults show that data reduction works very well on instances that are in 
some sense “close” to having the consecutive ones property. In fact, the 
real world instances considered could be reduced significantly, in most 
cases even to triviality. The study also conhrms and explains findings on 
similar techniques for related problems. 



1 Introduction 

The continuous stop location problem consists of placing new stations or stops 
along a given public transportation network in order to reach all potential cus- 
tomers and thus to improve attractiveness of using public means of transport. In 
particular, this scenario occurs in local public transport where additional stops 
in a rapid transit system or bus system, e.g. close to a residential or a settlement 
may induce significant increase of customers. Of course, the placement of new 
stops has certain additional effects to be considered as well. New stations require 
investment and operational costs and the travel time for customers already us- 
ing a bus or train on the affected line might increase. On the other hand, the 
total number of customers and the according gain is improved and the total 
door-to-door travel time might be decreased. 

As part of a project with the largest German railway company, we focused 
on a scenario where only the reachability of the network for potential customers 
was considered. That is, we are given a geometric graph, a (possibly empty) 
set of stations placed on the graph, a set of demand points in the plane and a 
radius. Weights are assigned to all edges and vertices of the graph representing 
the increase of travel time caused by new stop placed there. Then a demand point 
is covered by a (possibly new) station if the station is within the given radius 

* The authors acknowledge financial support from the EU for the RTN AMORE under 
grant HPRN-CT-1999-00104. 
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of that point. The problem consists in finding a set of stations with minimum 
total weight placed on the graph such that all demand points are covered. In [1] 
and [2] this problem is introduced as the continuous stop location problem and 
proved to be AfP-hard. It is also shown, that the problem can be formulated as 
a set covering problem by reducing the set of potential stations to a finite set of 
candidates. 

Set covering problems have been studied widely, for example in the context 
of crew scheduling. See [3] for an overview. As shown for example in [4], the set 
covering problem is notoriously hard to solve or even to approximate. However, 
it is observed already in [1] that set covering is solvable in polynomial time by 
linear programming if the covering matrix has the so-called consecutive ones 
property. In the geometric setting, the special case that the network consists of 
only one single line satisfies the consecutive ones property. Polynomial solvability 
of some similar special cases of related problems is also shown in [5] . 

Experiments with railway data indicate that the size of such instances can 
be reduced significantly by applying a simple reduction technique mentioned al- 
ready in [6]). Therefore we examine the power of data reduction for geometric 
set covering instances from the station location scenario as well as on generated 
instances. In addition to the known reduction techniques we apply a more ad- 
vanced reduction rule and a refined enumeration algorithm to solve weighted set 
covering problems. It turns out that all real world instances could be reduced 
almost completely. Even the largest instances were reduced within one minute 
leaving an instance that could usually be solved within a few seconds. 

One explanation for this nice behaviour is that the matrices occurring in our 
setting are close to having consecutive ones property or band diagonal form. In 
comparison, we conducted experiments on generated matrices (nearly) having 
consecutive ones property and found a similar behaviour. The computational 
results suggest that the effectiveness of data reduction highly depends on “close- 
ness” of the matrix to having consecutive ones property. For the real world data 
considered, closeness to the consecutive ones property is obviously affected by 
the underlying geometry. The study also confirms and explains results reported 
in [7] about the power of data reduction for similar covering problems. 

In the following section, well known reduction techniques for set covering 
are introduced and our new advanced reduction technique is presented. Effi- 
cient algorithms to implement these techniques are given and their behaviour 
is analyzed especially for instances with consecutive ones property. In Section 3 
a refined enumeration algorithm for set covering is introduced and its running 
time is analyzed. Section 4 presents our experimental study and discusses the 
results. 

2 Data Reduction for Set Covering 

2.1 Basic Definitions 

An instance I := (P, P, A, w) of our problem is given by two finite sets P and 
F with m := |P|, n := |P|, a positive weight function w on F and a, m x n- 
matrix A = over {0, 1}. We want to find a covering of P with 



762 S. Mecke and D. Wagner 



minimum weight, i.e. a subset F' C F such that for each p G P there is f G F' 
such that Gpf = 1 and cost(F') := X]/gf' '^/ minimized. 

If Gpf = 1 for a row p and a column / of A, then p is covered by /. For 
subsets F' C F and P' C P denote 

cov(i^') := {p G P\ f G F', Upf = 1} 
precov(P') := {/ G F\ p G P' , Opf = 1} . 

A set F' C F such that cov(F') = P is called a feasible solution for I. It is 
called an optimal solution for / if cost(P') is minimum. The set of all optimal 
solutions will be denoted by OVT{I) and the weight of an optimal solution by 
cost(/). An instance has a feasible solution if and only if every row of A contains 
at least one 1. We will assume in the following that all instances are solvable. 
As an instance of Set Covering is fully described by its covering matrix A 
and the weight function w, we also write I = {A,w) as an abbreviation. In the 
following let N := \{aij = 1}|. 

In the context of stop location P corresponds to the demand points in the 
plane (population areas) and F to the set of facilities (stations). Usually, an 
instance of the stop location problem is given by the geometric graph, the set of 
stations, the set of demand points in the plane and a radius. It is obvious that the 
covering matrix A can be easily derived from this. Moreover, we consider cases 
where no set of stations is given explicitly. Instead, a station can be placed on 
any (or given) positions on the graph. As already mentioned in the introduction, 
such an instance can be transformed into an equivalent instance with a finite set 
of stations. 

Definition 2.1. A matrix over {0, 1} has the strong consecutive ones property, 
if the ones in every row are eonsecutive. It has the (simple) consecutive ones 
property (CIP) if its columns can be permuted such that the resulting matrix 
has the strong consecutive ones property. 

For a matrix with CIP, a permutation of its columns to induce a matrix 
with strong consecutive ones property can be found in linear time using PQ- 
trees (cf. [8]). 

2.2 Reduction 

Definition 2.2. For columns f and g of A, f is dominated by g if either 
cov(/) C cov(g) and Wf > Wg or cov(/) = 0. For rows p and q of A, p is 
dominated by q if precav{q) C precov(p). 

For an instance (P, P, w, A) we use the following terminology: 

— If there are rows p ^ q such that row p is dominated by q, we call the 
instance reducible by rows. If there are columns f ^ g such that column / is 
dominated by g, we call the instance redueible by eolumns. An instance that 
is neither reducible by rows nor by columns is called completely reduced. 
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— A reduction step is denoted by a triple (M, i, k), where M € {row, col}, i, k 
denote two rows (resp. columns) and i is dominated by k. 

— Let a = (row,i,k) (resp. r = (col,i,k)) denote a reduction step. By a A 
(resp. tA) we denote the matrix resulting from deleting row (resp. column) 
i. The according instance is denoted by a I (resp. tI). 

It is a well known fact (see e.g. [6]) that dominated rows and columns can be 
deleted without changing solvability. Even more, the value of an optimal solution 
is maintained. More precisely: 

Lemma 2.3. Let a he a reduction step for an instance I . Then we have 

cost(a/) = cost(I) and OVT{aI) C OVT{I) ■ 

Accordingly, in order to solve an instance of Set Covering one can first 
apply a sequence of reduction steps and then solve the (completely) reduced 
instance. It can be even shown that a completely reduced instance is in a certain 
sense unique: 

Theorem 2.4. Let T = (A, 7) an instance and I = (A, c) and J = (B,d) two 
completely reduced instances derived from I. Then we have, for a suitable per- 
mutation 7T of the rows and columns of A 

7 tA = B and ttc = d (1) 

and thus 

cost(/) = cost(I) = cost(J) (2) 

OVT{I) c OVT{T) D OVTiJ) (3) 

|OPT(/)| = \OPT{J)\ . (4) 

We omit a formal proof here. Note that (2) already follows from Lemma 2.3. 
Equation (1) follows from the fact that, for two different reduction steps a and r 
for a matrix A, either tA = a A or r is still a valid reduction step for a A. Using 
this observation one can derive a bijection between the sequences of reduction 
steps leading to I and J respectively. □ 

2.3 Advanced Reduction 

In many cases, a column is dominated in the sense that a combination of two 
or more other columns would dominate it. Consider a column / and a set of 
columns G such that cov(/) C cov(G) and 



Wf^Y^Wg . 
geG 

Then / is dominated by G. If a column / is dominated by a set of columns G, 
we call its deletion advanced reduction step. However, it seems computationally 
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infeasible to consider for a column / all combinations of other columns. Instead 
we apply the following approach. For every row r and column / denote 
a column in precov(r) with minimal weight. Then column / is dominated by 
g together with the set of columns {/min(’')k G cov(/) — cov( 5 )} if cov(/) fl 
cov{g) ^ 0 and 

Wf>Wg+ ■ 

rGcov(/) — cov(5) 

For this advanced reduction, analogous results to Lemma 2.3 and Theo- 
rem 2.4 can be proved. Note however, that by applying advanced reduction 
certain types of solutions might be lost. Consider for example the covering ma- 
trix ( 1 J 1 ) with weights (2, 1, 1). It is not reducible by simple reduction, optimal 
solutions are = {1} and S 2 = {2, 3}. By advanced reduction it can be reduced 
to ( 01 ), with only a two element optimal. In the next section we will show that 
the advanced reduction can be implemented without increasing the asymptotic 
worst-case running time of the reduction. 

2.4 Reduction Algorithm 

We shortly review the algorithmic realization of reduction. In the following we 
consider a matrix obtained by reductions consisting of reduction steps as well as 
advanced reduction steps. 

Theorem 2.5. A matrix can he reduced hy columns in 0{nN) and by rows in 
0{mN). Furthermore a matrix can he reduced completely in 0{Nmn). 

Proof. Compare each pair of columns and delete dominated columns. If the 
matrix is represented as n adjacency lists, comparing one column to all 
others has cost 0{N), which leads to 0{nN) steps. Note that for advanced 
reductions the fmin{r) have to be computed only once. Obviously, the cost for 
computing the /min(?') is as well within 0{nN). Reduction by rows is analogous. 
For complete reduction of a matrix, reductions by rows and reductions by 
columns are applied alternating until no reduction step at all is applicable. In 
each reduction step at least one row or one column is deleted. Accordingly, 
the procedure is finished after 0(min(TO,n)) alternations. This induces cost 
0{Nmn) for complete reduction. □ 

Remark 2.6. If the matrix is very sparse the analysis can be improved further. 
Let the number of ones in every row and every column be smaller than a constant 
c, then every column has to be compared with at most <? other columns, where 
each comparison has cost 0{c). Then the matrix is reduced by columns within 
O(c^n) reduction steps. As before it follows that complete reduction can be done 
in 0(nm). 

Remark 2.1. Theorem 2.5 can be slightly improved using results of [9] and [10]: 
These papers present a lower bound and algorithms for the problem of finding 
extremal sets, which is equivalent to reduction by column (resp. rows). In [9] a 
lower bound of 17(A^/ log^(IV)) is proved, and an algorithm with time complex- 
ity 0(A^/log(A)) is given in [10] which is only slightly better than the running 
time of our much simpler approach. 
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2.5 Instances with Consecutive Ones Property 

Instances with CIP can be reduced very effectively. A matrix with CIP maintains 
this property after every reduction step. If the weights are all equal it can be even 
reduced to the unit matrix. For the weighted case, reduction can still be done very 
efficiently by establishing strong CIP and then sorting the rows lexicographically. 

Remark 2.8. Note that a reduced matrix with CIP for rows has CIP for columns. 

Set Covering for matrices with CIP can be solved by applying a shortest 
path algorithm in a graph with n nodes as described in [11]. In our experiments, 
we used Algorithm 1 which is exponential for general instances but polynomial 
on instances with CIP. 

3 A Refined Enumerative Algorithm 

For solving a completely reduced set covering problem, it is favorable to apply 
an algorithm that takes into account the (observed) special structure of the real 
world instances. More precisely, we aim at an algorithm that is especially effec- 
tive for instances that are “close” to having CIP. Although the well-established 
approaches for solving integer linear programs as e.g. branch and bound are 
very successful in practice, they are often disadvantageous for nicely structured 
instances. 

Alternatively, we designed the following simple, however refined enumerative 
solution algorithm. The basic idea is simply to enumerate all solutions by com- 
bining the covering columns for each row. This leads to partial solutions for all 
the rows considered so far and finally to a solution for the entire instance. This 
procedure is described in Algorithm 1. 

Throughout Algorithm 1 the cover of partial solutions is maintained in matrix 

B, the weights of the columns in array w' , and the partial solutions in array S. 
It always terminates, because in every iteration at least one row is removed from 

C. Before each iteration the following invariants hold by induction: 

— For every column f of B 

• the partial solution S / covers all rows deleted in step 2 up to this iteration 
and the rows with a one in column f of B. 

• Sf has weight w' f. 

— S contains a subset of an optimal solution for I. 

The key argument is that step 1 adds all solutions covering the first row. The 
correctness of the returned solution follows if C = (1). 

In order to keep the effort for computing the partial solutions reasonable, we 
apply again our reduction technique throughout the algorithm. Of course, the 
running time of Algorithm 1 is exponential in general. The size of matrix B, 
which contains all the partial solutions obtained so far can be still exponentially 
large. For certain kinds of instances, however, we can prove some nice properties: 
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Algorithm 1: Combi solver 

Input : Instance I = {P, F, w, A) 

Output : Optimal solution 
Start 

B := (0) G {0, weights w' := (0), solutions S := (0); 




while C 7 ^ (1) do 

forall the columns f of B with 0 in first row do 
forall the columns g of A with 1 in first row do 



Add a new column Ag + Bf to B , a weight Wg + w'f to w' and 
solution Sf U {g} to S; 

Remove column /; 

Remove first row of A and B ; 



Reduce C ;= 



and remove corresponding entries of w' and S\ 



return Si and w'l; 

End 



Theorem 3.1. If the covering matrix A has strong CIP, is reduced, the rows 
are sorted in lexicographic order, and c is the maximum number of ones per row 
and per column. Then Algorithm 1 has time complexity O(c^n). 

Proof. Let c be the maximum number of ones in a column or row of A. If the 
covering matrix A has strong CIP, is completely reduced, and the rows are sorted 
in lexicographic order it has, as noted in remark 2.8, also strong CIP for columns. 
This implies that after each iteration, the first column of B contains only zeroes, 
all other columns have strong CIP, contain a one in the first row, and no two 
columns are equal. Thus, B has at most c columns. So during Algorithm 1 for 
each column of A a new column is added to B at most c times. Each reduction 
is in O(c^) as stated in Theorem 2.5. □ 

Even for a weaker property than CIP Algorithm 1 has polynomial running 
time: 

Theorem 3.2. If the distance between the first and last entry in each column 
of the covering matrix is at most k, then at any time throughout Algorithm 1 
the number of columns in matrix B is in 0{2^n). The time complexity is in 
0{mn^2^^k). 

Proof. We prove by induction that after each iteration, the ones in B are con- 
tained in rows 1 to /c — 1. As B is completely reduced, no two columns are equal, 
so there are at most 2^ different columns. In every iteration only those columns 
of A with a one in the first row are considered. The ones of these columns are 
again contained in rows 1 to A:. Therefore for every column of B at most n new 
columns are added, having all their ones in rows 1 to A:. After addition of all new 



Solving Geometric Covering Problems by Data Reduction 767 



columns there are at most 2^n columns in B and there are at most 2^nfc ones 
in B. So by Theorem 2.5 reduction by columns is in 0{{2^n)2^nk). There are 
at most m iterations which leads to the claimed upper bound. □ 

As Theorem 3.2 indicates, the ordering of the rows is crucial for the running 
time of the algorithm. We have made good experiences with the Cuthill-McKee 
ordering (cf. [12]). This means that the rows are chosen in their order of ap- 
pearance in a breadth first search on the covering graph with adjacency matrix 
( ) . A simple on- line-heuristic which always tries to minimize the size of B 
in the next step leads to even better running times. 

4 Experimental Results 

The implementation is in C++ and runs on a i686-Linux platform. The experi- 
ments were performed on a Xeon(TM) CPU with 2.80GHz and 2GB RAM. 

4.1 Instances 

Real World Instances. We tested our algorithm with different data sets. The 
original motivation of this work was a project in collaboration with the largest 
German railway company. Its goal was to optimize the accessibility of customers 
to local train service. We therefore had access to the data on the German rail- 
way network (circa 8200 nodes and 8700 edges) and German settlements (circa 
30000). We tested different radii. According to the railway company, radii of two 
to five kilometers are the most realistic. 

We also had two different versions of the problems. The first version contains 
all settlements that are within the given radius of the railway network, the second 
version contains only those settlements that were not already covered by one of 
the (approx. 6800) existing German railway stations. The edges and nodes were 
weighted by the number of customers traveling through them. 



Generated Instances. As a second data set we considered randomly generated 
instances with GIP and suitably modified instances having almost GIP. We 
experimented with unweighted instances (all weights equal to 1) and instances 
where the weights were chosen randomly (equally distributed) between 1 and a 
maximum weight of 10 or 100. The matrices consist of 50 to 50000 rows with 10 
to 200 non-zero entries per row. Up to 20% of the ones were randomly flipped 
to zero resulting in perturbed instances. 

4.2 Performance of Reduction 

Real World Instances. The original problem consisted in solving instances 
that contain only those settlements that are not already covered by existing 
railway stations. Our experiments on practical data showed that these are always 
reduced to almost trivial instances in very short time. 
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Moreover these instances tend to become easier to solve with increasing ra- 
dius. For large radii many settlements are already covered by old stations and 
the instance decomposes into many, small components. 

The situation is different for 
instances without existing sta- 
tions. As an illustrative example 
for the effect of reduction see the 
figure on the left. It shows a sec- 
tion of the railway graph near 
the city of Karlsruhe. The cover- 
ing radius was 2.5km. The settle- 
ments are squares, potential can- 
didates for stops are ellipses. The 
original instance is colored grey, 
the instance remaining after re- 
duction is black. Obviously the solution for the shown part of the remaining 
instance is already trivial. The largest remaining component of the covering 
matrix after reduction contained 8 rows and 9 columns. 

With increasing radius, however, these instances get quite complex in terms 
of total matrix size and number and size of independent components. However, 
reduction is still very effective here. Figure 1 summarizes reduction rates for 
the German railway graph and radii between 1 and 10km. In these examples 
the advanced reduction technique is much more effective, especially for large 
radius. It even seems that the size of the reduced matrix does not grow with 
increasing radius. 




Generated Instances. When looking at the CIP matrices, we first observe 
that the unweighted instances can be reduced completely, that is, to the trivial 
unit matrix. The weighted matrices could not be reduced completely by the first 
step, but almost completely by the improved reduction. 

Perturbed instances are harder to reduce. For weighted, perturbed instances 
simple reduction is not very effective but advanced reduction yields very good 
results as Figure 2 illustrates. Figure 3(a) shows that unweighted instances with 
10% perturbation could usually be reduced by a factor of 10, instances with 20% 
only by a factor of around 2. 



Increasing the Radius. Of course, N grows with increasing radius, but also 
the combinatorial structure of the instances changes. This affects the effective- 
ness of reduction. As r increases the reduction ratio decreases first, increases for 
moderate radii and finally drops to almost zero for very large radii (The left side 
of Figure 4 shows this effect for a sub-instance of the german railway graph). 
For generated instances this effect could only be simulated to a certain extent 
by just increasing the number of ones per row (see right side of Figure 4). 
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Fig. 1. Effectiveness of reduction for the German railway graph. 
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Fig. 2. Reduction ratios for unperturbed resp. 20% perturbed generated instances. 
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Fig. 5. Reduction time resp. solve time vs. matrix size, weights G (0..10), 10% pertur- 
bation. 



4.3 Time Complexity 

As noted before real world instances with existing stops can be reduced very 
fast, because they decompose into many, small components. For the variants 
without existing stations, although the reduction rates were always very good, 
the running time for solving the instances grew quite fast for radii greater or equal 
to around 10km. Note, however, that radii larger than 10km are irrelevant in our 
scenario. For moderate radii we observed a quadratic or slightly subquadratic 
behaviour (in terms of the matrix size) of the running times for reduction and 
an almost linear running time for solving the set covering problem. 

Looking at the generated instances, our experiments suggest a quadratic time 
complexity for reduction and an almost linear time complexity for solving the 
set covering problem (Figure 5). Only for the 20% perturbed, unweighted CIP 
matrices we observed a quadratic behaviour. 
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For weighted instances advanced reduction also improved the running time 
for our solver. Figure 3(b) shows this for the case of 10% perturbed, weighted, 
generated instances. A speedup factor of 100 and more could be observed here. 
Even the running time for advanced reduction was usually reduced, owing to a 
the reduced matrix size after the first rounds of reduction. 

5 Conclusions 

The station location problem was solved almost completely by reduction for our 
initial real world data. The experimental study confirms the conjecture that one 
reason for this nice behaviour is the closeness of our instances to CIP. This 
explains also the results from [7] about the power of data reduction for similar 
covering problems. Actually it is likely that the data considered there in many 
cases have (at least almost) CIP. 

References 

1. Hamacher, H.W., Liebers, A., Schobel, A., Wagner, D., Wagner, F.: Locating new 
stops in a railway network. Electronic Notes in Theoretical Computer Science, 
Proc. ATMOS 2001 50 (2001) 

2. Schobel, A., Hamacher, H.W., Liebers, A., Wagner, D.: The continuous stop loca- 
tion problem in public transportation networks. Report in Wirtschaftsmathematik 
81 (2002) 

3. Caprara, A., Fischetti, M., Toth, P.: Algorithms for the set covering problem. 
Annals of Operations Research 98 (2000) 353-371 

4. Lund, C., Yannakakis, M.: On the hardness of approximating minimization prob- 
lems. Journal of the ACM 41 (1994) 960-981 

5. Kranakis, E., Penna, P., Schlude, K., Taylor, D.S., Widmayer, P.: Improving cus- 
tomer proximity to railway stations. In Petreschi, R., Persiano, G., Silvestri, R., 
eds.: Proceedings of 5th Italian Conference on Algorithms and Complexity (Cl AC 
2003). Volume 2653 of Lecture Notes in Computer Science., Springer (2003) 264- 
276 

6. Nemhauser, G., Wolsey, L.: Integer and Combinatorial Optimization. Wiley (1988) 

7. Weihe, K.; Covering trains by stations or the power of data reduction. In Battiti, 
R., Bertossi, A. A., eds.: Proceedings of Algorithms and Experiments (ALEX 98). 
(1998) 1-8 

8. Booth, K.S., Lueker, G.S.: Testing for the consecutive ones property, interval 

graphs, and graph planarity using PQ-tree algorithms. Journal of Computer and 
System Science 13 (1976) 335-379 

9. Yellin, D.M., Jutla, C.S.: Finding extremal sets in less than quadratic time. Inform. 
Process. Lett. 48 (1993) 29-34 

10. Pritchard, P.: An old sub-quadratic algorithm for finding extremal sets. Informa- 
tion Processing Letters 62 (1997) 329-334 

11. Schobel, A.: Customer-Oriented Optimization in Public Transportation. PhD 
thesis. University of Kaiserslautern (2002) Habilitation. 

12. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: 
Proc. ACM National Conference, Association of Computing Machinery (1969) 



Efficient IP Table Lookup via Adaptive 
Stratified Trees with Selective Reconstructions* 
(Extended Abstract) 



Marco Pellegrini and Giordano Fusco 

Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, via Moruzzi 
1, 56124 Pisa, Italy, {marco .pellegrini , giordano . fuscojOiit . cnr . it 



Abstract. IP address lookup is a critical operation for high bandwidth 
routers in packet switching networks such as Internet. The lookup is a 
non-trivial operation since it requires searching for the longest prefix, 
among those stored in a (large) given table, matching the IP address. 
Ever increasing routing tables size, traffic volume and links speed de- 
mand new and more efficient algorithms. Moreover, the imminent move 
to IPv6 128-bit addresses will soon require a rethinking of previous tech- 
nical choices. This article describes a the new data structure for solving 
the IP table look up problem christened the Adaptive Stratihed Tree 
(AST). The proposed solution is based on casting the problem in ge- 
ometric terms and on repeated application of efficient local geometric 
optimization routines. Experiments with this approach have shown that 
in terms of storage, query time and update time the AST is at a par with 
state of the art algorithms based on data compression or string manip- 
ulations (and often it is better on some of the measured quantities) . 



1 Introduction 

Motivation for the Problem. Internet is surely one of the great scientific, 
technological and social successes of the last decade and an ever growing range 
of services rely on the efficiency of the underlying switching infrastructure. Thus 
improvements in the throughput of Internet routers are likely to have a large 
impact. The IP Address Lookup mechanism is a critical component of an Internet 
Packet Switch (see [I] for an overview). Briefly, a router within the network holds 
a lookup table with n entries where each entry specifies a prefix (the maximum 
length w of a prefix is 32 bits in the IPv4 protocol, 128 bits in the soon-to-be- 
deployed IPv6 protocol) and a next hop exit line. When a packet comes to the 
router the destination address in the header of the packet is read, the longest 
prefix in the table matching the destination is sought, and the packet is sent to 
the corresponding next hop exit line. How to solve this problem so to be able to 
handle millions of packets per second has been the topic of a large number of 
research papers in the last 10-15 years. Steady increase in the size of the lookup 
tables and relentless demand of traffic performance put pressure on the research 
community to come up with faster schemes. 

* Work partially supported by the Italian Registry of ccTLD .it. 
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Table 1. L2-normalized costs for points of 8 bits. Measures in clock ticks. The first 
negative entry is due to the inherent approximation in the measurement and normaliza- 
tion of the raw data; as an outlayer is not used in the construction of the interpolation 
formula. 



Position 


Level 




1 


2 


3 


4 


5 


Empty 


-3 


20 


42 


66 


116 


1 


12 


30 


65 


93 


121 


2 


12 


35 


67 


94 


125 


3 


16 


35 


64 


100 


128 


4 


22 


40 


69 


101 


129 



Our Contribution. In the IP protocol the address lookup operation is equiv- 
alent to the searching for the longest prefix match (Ipm) of a fixed length string 
amongst a stored set of prefixes. It is well known that the longest prefix match 
problem can also be mapped into the problem of finding the shortest segment 
on a line containing a query point. This is called the geometric (or locus) view as 
opposed to the previous string view. The geometric approach has been proposed 
in [2] and [3]; it has been extended in [4] and in several recent papers [5,6]. It 
has been used in [7] to prove formally the equivalence among several different 
problems. Important asymptotic worst case bounds have been found following 
the geometric point of view, but, currently, the algorithms for IP lookup with 
the best empirical performance, including the one in [7], are based on the string 
view and use either optimized tries, or data compression techniques, or both. 
The question that we rise is whether the geometric view is valuable for the IP 
lookup problem only as a theoretical tool or it is also a viable tool from a practi- 
cal point of view. Since our objective is practical we chose for the moment not to 
rely on the sophisticated techniques that have lead to the recent improvements 
in worst case bounds (e.g. those in [4,5,6]). Instead we ask ourselves whether 
simpler search trees built with the help of local optimization routines could lead 
to the stated goal. The experiments with the AST method lead us to give a 
strong positive hint on both account: (1) it is indeed possible to gain state of the 
art performance in IP lookup using the geometric view and (2) local adaptive 
optimization is a key ingredient for attaining this goal. 

Methodology. Since this is an experimental paper and the evaluation of the ex- 
perimental data is critical we stress the experimental methodology beforehand. 
Performance is established by means of experiments which are described in detail 
in Sect. 3. The experimental methodology we adopted follows quite closely the 
standards adopted in the papers describing the methods we compare to. Given 
the economic relevance of Internet technologies, such state of the art methods 
have been patented or are in the process of being patented, and, at the best of 
our knowledge, the corresponding codes (either as source or as executables) are 
not in the public domain. We refrained from re-implementing these methods, 
using as guidance the published algorithms, since we are aware that at this level 
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Table 2. List of Input Lookup Tables. Input geometric statistics for active, duplicates 
and phantom points. 





Routing Table 


Date 


Entries 


Total 

Points 


Active 

Points 


Phantom 

Points 


Duplicate 

Points 


1 


Paix 


05/30/2001 


17,766 


35,532 


17,807 


9,605 


8,120 


2 


AADS 


05/30/2001 


32,505 


65,010 


32,642 


17,071 


15,297 


3 


Funet 


10/30/1997 


41,328 


82,656 


24,895 


35,391 


22,370 


4 


Pac Bell 


05/30/2001 


45,184 


90,368 


34,334 


33,750 


22,284 


5 


Mae West 


05/30/2001 


71,319 


142,638 


53,618 


50,217 


38,803 


6 


Telstra 


03/31/2001 


104,096 


208,192 


64,656 


79,429 


64,107 


7 


Oregon 


03/31/2001 


118,190 


236,380 


45,299 


116,897 


74,184 


8 


AT&T US 


07/10/2003 


121,613 


243,226 


102,861 


62,910 


77,455 


9 


Telus 


07/10/2003 


126,282 


252,564 


85,564 


86,738 


80,262 


10 


AT&T East Canada 


07/10/2003 


127,157 


254,314 


78,625 


93,795 


81,894 


11 


AT&T West Canada 


07/10/2003 


127,248 


254,496 


78,708 


93,824 


81,964 


12 


Oregon 


07/10/2003 


142,475 


284,950 


108,780 


84,165 


92,005 



Table 3. Comparisons of IP methods for a 700 MHz Pentium III (T = 1.4 ns) with 
1024 KB L2 Cache, L2 delay is D = 15 nsec, LI latency m — 4 nsec. 



Ref 


Method 


Input size 


Formula 


Time (ns) 


Memory (KB) 


[13] 


Lulea 


38141 


53T -b 12D 


254 


160 


[10] 


Variable stride 


38816 


26T-b 3D 


81 


655 


[14] 


Exp. /Con. 


44024 


3T+ 3D 


49 


1057 


This paper 


Slim AST 


45184 


m + 3D 


49 


464 


This paper 


Slim AST 


142475 


m + 3D 


49 


941 



of performance (few dozens of nanoseconds) overlooking seemingly simple imple- 
mentation details could result in an grossly unfair comparisons. Given the above 
mentioned difficulties we resolved for an admittedly weaker but still instructive 
comparison via the fact that those papers present , either an explicit formula for 
mapping their performance onto different machines, or such formulae could be 
derived almost mechanically from the description in words. A second issue was 
deciding the data sets to be used. Papers from the late nineties did use table 
snapshots usually dating from the mid nineties, when a table of 40,000 entries 
was considered a large one. Although recovering such old data is probably fea- 
sible and using them certainly interesting, the evolution of Internet in recent 
years has been so rapid that the outcome of comparisons based on outdated test 
cases is certainly open to the criticism of not being relevant for the present (and 
the future) of Internet. Therefore the comparisons shown in Table 3, where we 
use data sets of comparable size and the mapping onto a common architecture 
given by the formulae, should be considered just in a broad qualitative sense. 
The main general conclusion we feel we can safely draw is that geometric view is 
as useful as insight drawn from string manipulation and data compression. We 
remark that the extensive testing with 12 Lookup Tables of size ranging from 
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Table 4. Slim-AST data structure performance. Tested on a 700 MHz Pentium III 
[T — 1.4 ns) with 1024 KB L2 Cache, L2 delay is D = 15 nsec, LI latency m = 4 nsec. 
The parameters used are: B = 16, C = 10~^, D = 61 for all tables except (*) where 
C= 1. 





Routing Table 


Entries 


Target 

Cache 


Total 

Memory 


Worst query 
(clock-ticks) 


Corresponding Class 
{l,p,a) 


1 


Paix 


17,766 


512 


374,314 


35 


2,2, 8 


2 


AADS 


32,505 


512 


460,516 


35 


2,2, 8 


3 


Funet 


41,328 


512 


453,034 


30 


2,1, 8 


4 


Pac Bell 


45,184 


512 


464,063 


35 


2,2, 8 


5 


Mae West 


71,319 


1024 


610,587 


35 


2,2, 8 


6 


Telstra 


104,096 


1024 


780,452 


35 


2,2,16 


7 


Oregon 


118,190 


1024 


595,919 


35 


2,3, 8 


8 


AT&T US 


121,613 


1024 


955,702 


30 


2,1, 8 


9 


Telus (*) 


126,282 


1024 


878,179 


40 


2,4, 8 


10 


AT&T East Canada 


127,157 


1024 


878,155 


40 


2,4, 8 


11 


AT&T West Canada 


127,248 


1024 


878,491 


40 


2,4, 8 


12 


Oregon (*) 


142,475 


1024 


940,810 


35 


2,2,16 



17,000 to 142,000 (reported in Tables (2), and (4)) confirms the reliability and 
robustness of the AST. 

Previous Work. The literature on the IP lookup problem and its higher di- 
mensional extension is rather rich and we refer to the full version for survey. 
Work most relevant to the AST is mentioned as needed. 

Summary. The AST is introduced in Sect. 2.1. In Sect. 2.2 the local optimiza- 
tion routines are described. Experiments and results for storage and query time 
are in Sect. 3, while dynamic operations are discussed in Sect. 4. 



2 The AST 

2.1 The AST in a Nutshell 

The AST algorithm is described in Section (2.2). Here we summarize the main 
new ideas. The best way to explain the gist of the algorithm is to visualize 
it geometrically. We first map the lookup problem into a predecessor search 
problem using the standard method. The equivalent input for the predecessor 
problem is a set of labeled points on the real line. We eliminate from the point 
set duplicates and points for which there is the same label to their left and right 
{phantom points). We want to split this line into a grid of equal size buekets, and 
then proceed recursively and separately on the points contained in each grid 
bucket. A uniform grid is completely specified by giving an anchor point a and 
the step s of the grid. During the query, finding the bucket containing the query 
point p is done easily in time 0(1) by evaluating the expression \ (jp—a ) /sj which 
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gives the offset from the special bucket containing the anchor. We will take care 
of choosing for s a power of two so to reduce integer division to a right shift. If we 
choose the step s too short we might end up with too many empty buckets which 
implies a waste of storage. We choose thus s as follows: choose the smallest s 
for which the ratio of empty to occupied buckets is no more than a user-defined 
constant threshold. On the other hand shifting the grid (i.e. moving its anchor) 
can have dramatic effects on the number of empty buckets, occupied buckets and 
the maximum number of keys in a bucket. So the search for the optimal step 
size includes an inner optimization loop on the choice of the anchor to minimize 
the maximum bucket occupation. The construction of the (locally) optimal step 
and anchor can be done efficiently in time close to linear, up to polylog terms 
(see Section 2.2). The algorithm works in two main phases, in the first phase we 
build a tree that aims at using small space at the expense of a few long paths. 
In the second phase the long paths are shortened by compressing them. 

During the construction a basic sub-task is finding a grid optimizing a local 
measure. This approach has to be contrasted with several techniques in literature 
where a global optimality criterion is sought usually via much more expensive 
dynamic programming techniques, such as in [8], [9], [10], and [7]. loannidis. 
Grama and Atallah [11] recently proposed a global reconstruction method for 
tries that relies on a reduction to the knapsack problem and therefore suggests 
the use of knapsack approximation methods. In [11] the target is to reduce the 
average query time based on traffic statistics, not the worst case query time. 

A simple theoretical analysis of the asymptotic worst case query time and storage 
for the AST is shown in [12] giving asymptotic worst case bounds 0{n) for 
storage and O(logn) for query time. However the query time bound does not 
explain the observed performance and we leave it as an open problem to produce 
a more refined analysis considering a variety of models. 



2.2 Construction of the AST 

Here we describe the AST data structure by iteratively refining a general frame- 
work with specific choices. Consider initially the set U = [0, ..,2™ — 1] of all 
possible addresses, the set P of prefixes, and the set S' = 5'p of end-points of 
the segments obtained by mapping each prefix b £ P to the set of addresses of 
U matching b as described above. We will often think of 1/ as embedded in the 
real line K. The main idea is to build recursively a tree by levels, each node x 
of the tree has an associated connected subset of the universe C U, which is 
the set of all queries that visit node x, and a point data set Sx C S, which is 
S (lUx- For the root r we have Ur = U and Sr = S. Let a: be a node and let 
kx be the number of children on x. By y[l], ...,y[kx] we denote the children of 
X. A particular rule (to be described below) will tell us how to decide kx and 
for each i £ [1, kx] how to compute Afterwards simply fl Sx- 

For each child y[i], if is empty we store at y[i] the next-hop index of the 
shortest segment covering y[i\] and we stop the recursion. For each child y[j], 
if Sy[i] has one point we store at y[i] the next-hop indices for the two intervals 
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to the left and right of that point; and we stop the recursion. For simplicity we 
explain the process at the root. Let 5" be a set of points on the real line, and 
span{S) the smallest interval containing S. Consider an infinite grid G(a, s) of 
anchor a and step s, G{a^s) = {a + ks : k € N}. By shifting the anchor by s 
to the right, the grid remains unchanged: G(a, s) = G(a + s,s). Consider the 
continuous movement of a grid: faio.) = G(a + as, s) for a G [0, 1]. Consider an 
event a value for which S fl faio-i) ^ 0- 

Lemma 1. The number of events is at most n. 

Proof. Take a single fixed interval I of G(a, s), we have an event when the moving 
left extreme meets a point in I, this can happen only once for each point in I. 
Therefore overall there are at most n events. □ 

Since we study bucket occupancy, extending the shift range beyond 1 is useless 
since every distribution of points in the bucket has already been considered, 
given the periodic nature of the grid. Consider point p € S and the bucket Ip 
containing p. Point p produces an event when ap = {p— left(Ip))/s that is when 
the shift is equal to the distance from the left extreme of the interval containing 
p. Thus we can generate the order of events by constructing a min-priority queue 
Q{S,ap) on the set S using as priority the value of ap. We can extract itera- 
tively the minimum for the queue and update the counters c(/) for the shifting 
interval /. Note that for our counters an event consists in decreasing the count 
for a bucket and increasing it for the neighbour bucket. Moreover we keep the 
current maximum of the counters. To do so we keep a second max-priority queue 
Q(/, c(/)). When a counter increases we apply the operation increase-key, when 
it decreases we apply the operation decrease-key. Finally we record changes in 
the root of the priority queue, recording the minimum value found during the 
life-time of the algorithm. This value is 

g(s) = min max |S'n/| 

aG[0,l] 7GG(as.s) 

that is we find the shift that for a given step s minimizes the maximal occupancy. 
The whole algorithm takes time 0(n log n). In order to use a binary search 
scheme we need some monotonicity property which we are going to prove next. 

Lemma 2. Take two step values s and t, with t = 2s then we have g(t) > g{s). 

Proof. Consider the grid Gmin{t) that attains min-max occupancy K = g{t). 
So every bucket in G(t) has at most K elements. Now we consider the grid 
G(s) that splits exactly in two every bucket in Gminif)- In this grid G(s) the 
maximum occupancy is at most K, so the value g{s) that minimizes the maxi- 
mum occupancy for a translate of G(s) cannot attain a larger value than K, i.e. 
g{s)<K = g{t). □ 

Now if we use only powers of two as possible values for a grid step the above 
monotonicity lemma implies that, for a given threshold x, we can use binary 
search to find the largest step s = 2^ with g{s) < x and the smallest step u = 2^ 
with g{u) > X. Call R = E/F the ratio of the number empty to the full buckets. 
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Lemma 3. Take two step values s and t, with t = 2s then we have Rmin{t) < 

■ 

Proof. Take the grid Gmin{s), the grid of step s minimizing the ratio Rmin{s) = 
Eg/Fg. Now make the grid G{t) by pairing adjacent buckets we have the relations: 
Ng = N, Nt = N/2, Fg/2 < Ft < Fg, and {Eg — Fg)/2 < Ft < Egj2. Now we 
express Rt = Ft / Ft as a, function of N and Ft- 

Rt = Et/Ft = N/Ft - 1. 

This is an arc of hyperbola (in the variable Ft) having maximum value for 
abscissa Ft = Fg/2. The value of the maximum is Eg/Fg = Rmin{s). Thus we 
have shown that R{t) < Rmin{s). Naturally also Rmin{t) < R{f) so we have 
proved Rmin{t) ^ Rmin{s)- Thus the minimum ratio is monotonic increasing as 
grids get finer and finer. □ 

By the above lemmas we have two functions g{.) and Rmin{-) that are both 
monotone (one increasing, one decreasing) in the step size that we can compute 
efficiently. 

Phase I: Saving Space. In order to control memory consumption we adopt 
the following criterion recursively and separately at each node: find smallest s 
with R{s) < C, for a predefined constant threshold G then take the grid giving 
the occupancy g(s), the min-max occupancy (this shifting does not change the 
number of nodes at the level). When the number of points in a bucket drops 
below a (small) threshold 2D we stop the recursion and we switch to standard 
sequential search. 

Phase II: Selective Reconstruction. The tree built in Phase I uses linear 
space but can have some root-to-leaf path of high length. The purpose of the 
second phase is to reduce the length of the long paths, without paying too much 
in storage. 

An access the RAM memory is about ten times slower than one to L2 cache. 
Thus it is fundamental to fit all the relevant data to into the L2 cache. It is 
reasonable to suppose that the target machine has 1 Mb of L2 cache and that 
it is completely devoted to the IP Table Lookup. With this hypothesis the size 
of the routing table does not make difference unless it is below 1 Mb. When the 
routing table is built, we can perform a selective reconstruction of the deepest 
paths to flatten the AST in order to reduce the worst query time maintaining 
the total memory consumption below 1 Mb. 

The cost of a node. Each node of the AST has associated a cost which rep- 
resents the time of the slowest query among all the queries that visit that node. 
Thus the cost of an empty leaf is the time to query a point inside the space of 
that leaf. The cost of a full leaf is the time to reach the last point stored inside 
it (i.e. the time to visit all the points from the median to the farthest extreme). 
The cost of an internal node is the maximum of the costs of all its children. 
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The leaves can be subdivided into classes (l,p,a) depending on: their level in 
the AST, the number of points from the median to the farthest extreme and the 
number of bits used to stored the points in the leaf (8, 16 or 32). To guide the 
selective reconstruction a Clock-ticks Table is build once for each CPU archi- 
tecture. For all the possible classes of leaves, the table stores the corresponding 
query time. The query time of a particular class is measured in the following 
way: a simple AST containing nodes of the chosen class is built, several queries 
reaching nodes of that class are performed, the minimum time is taken and it is 
normalized to L2 cache. The rationale is that a machine dedicated to the IP Ta- 
ble Lookup is capable to perform a query in the minimum time measured in the 
test machine because the processor is not disturbed by other running processes. 
The measurements have been done using Pentium specific low level primitives 
to attain higher precision; nothing forbids to use operating system primitives for 
better portability. 

Selective Reconstruction. The following steps are repeated while the size of 
the Routing Table is below 1 Mb and there are other reconstructions to perform: 
(i) Create a max-priority queue of internal nodes based on the maximum cost of 
their children, (ii) Visit the AST and consider only internal nodes having only 
leaves as children: determine the cost of these nodes and insert them in the max- 
priority queue, (iii) Extract from the queue the nodes with the same maximum 
cost and flatten them in the following way: if the maximum number of points in 
a child full leaf is greater than 1 then split the step until the maximum number 
of points in a bucket becomes smaller than the current maximum; otherwise go 
the parent and split its step until the maximum number of points in a bucket 
becomes smaller than maximum number of points a full leaf can contain (in this 
way the level below is eliminated). 

A few exceptions to the above rules: the root is not rebuilt, and, for simplicity, 
when a level is eliminated other reconstructions of same cost are performed only 
after recomputing the priority queue. 



3 Experiments with AST 

The testing of the (empirical) worst case query time is done statically on the 
structure of the tree by finding out the longest (most expensive) path from root 
to leaf. The worst case query time gives valuable information that is relevant for, 
but independent from, any actual query data stream. Table (2) gives a synthetic 
view of the IP Lookup Tables used for testing the AST data structure. For each 
table we report a progressive number, the name of the organization or router 
holding the table, the date when the Table was downloaded and the number of 
prefixes in the tables. Each prefix generates two points, next we classify the points 
into active, phantom and duplicate points, giving the count for each category. 
Table (4) gives the relevant measure for the Slim- AST. Slim-AST is built only 
on the active points (without duplicates and phantom points) and it can support 
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the queries. Updates are supported with the help of an auxiliary data structure 
that records also the missing data (duplicates and phanthoms). 

Normalization in L2. To test the AST data structure we adopt the method- 
ology in [13]. For completeness we describe the methodology. We visit each node 
of AST twice so to force data for the second invocation in LI cache (This is 
known as the hot cache model). We measure number of clock-ticks by reading 
the appropriate internal CPU registers relative to the second invocation with 
the schema described above . We also record the number of LI memory accesses 
of each query. This is important because, knowing the LI latency, we decompose 
the measured time in CPU-operation cost and memory access cost. Afterwards 
we scale the memory access cost to the cost of accessing data in L2 cache, unless 
by locality we can argue that the data must be in LI cache also on the target 
machine. We call this measurement model the L2-normalized model. Since the 
root is accessed in each query, after unrolling the search loop, we store step, 
anchor and pointer to the first child at the root in registers so the access to the 
root is treated separately from accessing any other level of the AST. The results 
of the L2-normalized measures are shown in Table (1) for 8-bit keys, access times 
for 16 and 32 bit keys are almost identical. 

Target architecture. Here we derive a formula we will use to scale our results 
to other architectures. The target architecture is made of one processor and two 
caches: one LI cache and one L2 cache. P O CLl o CL2. We suppose every 
memory access is an LI miss, except when we access consecutive cells. Calling 
M the L2 access time, m the LI access time, the general formula for accessing a 
point (fc, s) is: A+k{M + m + B) + {M + m) + {s—l){C + m). Where we account 
for any initial constant, for accessing the intermediate nodes, for accessing the 
leaf and for accessing sequentially points stored at the leaf. Now taking the 
measurements on our machine and the known values of M and m we determine 
the constants A, B, C. Since in the slim table we have only results for classes from 
(2, 1) to (2,4) in Table (1) we need accuracy in this range. Via easy calculations, 
the final interpolation formula for any (2, s) class is: 3(M -|- m) -I- (s — l)m. 
In Table (3) we compare the AST (slim) method with three other methods for 
which analogous formulae have been derived by the respective authors (explicitly 
or implicitly), mapped onto a common architecture. The method in [13] is by 
far the most storage efficient, using roughly 4 bytes per entry; it is based on 
tree compression techniques, which make updates difficult. The query time in 
somewhat high. The Expansion/ Compression method [14] is very fast (requires 
only 3 memory access) using a fair amount of memory and requiring extensive 
reconstructions of the data structure at each update. The method of [10] balances 
well query time, storage and update time. Its performance for tables of 40,000 
entries is not far form the one of the AST. It is not clear however how the balance 
of resource utilization would scale on tables three times as large. Clearly as 
processors become faster the time is dominated by the memory access time. So 
we have our method and [14] converge to roughly the same query time, however 
our use of storage seems more efficient. 
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4 Dynamic Operations 

Description of the dynamic operations. We give a brief sketch of the dy- 
namic operations. Inserting a new prefix s involves two phases: (1) inserting the 
end-points of the corresponding segment M(s); (2) updating the next-hop index 
for all the leaves that are covered by the segment M{s). While phase (1) involves 
only two searches and local restructuring, phase (2) involves a DFS of the por- 
tion of the AST tree dominated by the inserted/deleted segment. Such a DFS is 
potentially an expensive operation, requiring in the worst case the visit of the 
whole tree. However we can trim the search by observing that it is not necessary 
to visit sub-trees with a span included into a segment that is included in M{s) 
since these nodes do not change the next hop. There are n prefixes and 0(n) 
leaves, and the span of each leaf is intersected without being enclosed by at most 
2D + 1 segments, therefore only 0(1) leaves need to be visited on average and 
eventually updated. Deleting prefix s from the AST involves phases similar to 
insertion, however we need to determine first the segment in the AST including 
M{s) in order to perform the leaf re-labeling. This is done by performing at most 
w searches, one for each prefix of s, and selecting the longest such prefix present 
in the AST. Although it is possible to check and enforce the other AST invari- 
ants at each update we have noticed that the query performance remains stable 
over long sequences of random updates. Therefore enforcing the AST invariant 
is best left to an off-line periodic restructuring process. 

Experimental data. We perform experiments on updating the AST using as 
update data stream the full content of each table. For deletions this is natural 
since we can delete only prefixes present in the table. For insertions, we argue 
that entries in the table have been inserted in the past, thus it is reasonable to use 
them as representative also for the future updates. Table (5) shows the number 
of memory accesses and the percentile time in /x-seconds for completing 95%, 
99% and 100% of the updates (insertions) for the AST data structures built for 
the lookup tests. Experiments for deletion give almost identical counts. During 
the update we count the number of memory cells accessed distinguishing access 
in L2 and those in LI, assuming the data structure is stored completely in L2 
cache. Since in the counting we adopt a conservative policy, i.e. we upper bound 
the number of cells accessed at each node visited during the update, the final 
count represents an upper bound on the actual number of memory access. As it 
is to be expected, the shortest prefixes, corresponding to the longest segments, 
require a visit of larger portions of the tree and are responsible for the slowest 
updates. 

Comparison with known methods. In [10] an estimate of 2500 /x-sec for worst 
case updates is given counting the dominant cost of the memory accesses. The 
cost is incurred when a node in the trie with a large out degree (2^^) needs to be 
rebuilt, and an access cost of 20 nsec per entry is assumed. Since 24-bit prefixes 
are very frequent (about 50% of the entries) the variable stride tries are forced so 
that updating for such length is faster, requiring roughly 3 /x-seconds. Although 
our machine is faster than that used in [10] the L2 access time is quite similar. 
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Table 5. Slim-AST data structure performance of insertion. Number of memory 
access in L2 and LI cache for the 100, 99 and 95 percentiles. Time estimate in the test 
machine. Tested on a 700 MHz Pentium III (T = 1.4 ns) with 1024 KB L2 Cache, L2 
delay is D = 15 nsec, LI latency m = 4 nsec. 





Routing Table 


100% 


99% 


95% 


Acct 

L2 


Bsses 

LI 


Time 

/i-sec 


Acc 

L2 


esses 

LI 


Time 

/i-sec 


Acc 

L2 


esses 

LI 


Time 

/i-sec 


1 
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therefore the timings in [10] can be compared (although only on a qualitative 
basis) with results in table 5. We believe that the 2500 /i-sec worst case estimate 
for [10] is overly pessimistic: in our experiments (deletion and insertion of every 
entry of every table) we never had to rebuild nodes of very high degree. Updates 
in [13] and [14] are handled by rebuilding the data structure from scratch, thus 
requiring time of the order of several milliseconds for every update. 



5 Conclusions and Open Problems 



Preliminary experiments with AST give encouraging results when compared with 
published results for state of the art methods, however fine tuning of the AST 
requires choosing properly some user defined parameters and we do not have at 
the moment general rules for this choice. How to find a theoretical underpinning 
to explain the AST good empirical performance is still an open issue for further 
research. A second open line of research is the integration of the AST techniques 
for empirical efficiency within the framework of the Fat Inverted Segment tree 
which presently yield good worst case asymptotic bounds. 



Acknowledgments. We thank Giogio Vecchiocattivi who contributed to test- 
ing an early version of the AST data structure [12], and the partecipants of the 
mini-workshop on routing (held in Pisa in March 2003) for many interesting 
discussions. 



Efficient IP Table Lookup via Adaptive Stratified Trees 783 



References 

1. McKeown, N.: Hot interconnects tutorial slides. Stanford University, available at 
http://klamatli.stanford.edu/talks/ (1999) 

2. Suri, S., Varghese, G., Warkhede, P.: Multiway range trees: Scalable ip lookup with 
fast updates. Technical Report 99-28, Washington University in St. Luis, Dept, of 
Computer Science (1999) 

3. Lampson, B.W., Srinivasan, V., Varghese, G.: IP lookups using multiway and 
multicolumn search. lEEE/ACM Transactions on Networking 7 (1999) 324-334 

4. Feldmann, A., Muthukrishnan, S.: Tradeoffs for packet classification. In: Proceed- 
ings of INFOCOM, volume 3, IEEE (2000) 1193-1202 

5. Thorup, M.: Space efficient dynamic stabbing with fast queries. In: Proceedings 
of the 35th Annual ACM Symposium on Theory of Computing, June 9-11, 2003, 
San Diego, CA, USA, ACM (2003) 649-658 

6. Kaplan, H., Molad, E., Tarjan, R.E.: Dynamic rectangular intersection with prior- 
ities. In: Proceedings of the thirty-fifth ACM symposium on Theory of computing, 
ACM Press (2003) 639-648 

7. Buchsbaum, A.L., Fowler, G.S., Krishnamurthy, B., Vo, K.P., Wang, J.: Fast prefix 
matching of bounded strings (2003) In Proceedings of Alenex 2003. 

8. Cheung, G., McCanne, S.: Optimal routing table design for IP address lookups 
under memory constraints. In: INFOCOM (3). (1999) 1437-1444 

9. Gupta, P., Prabhakar, B., Boyd, S.P.: Near optimal routing lookups with bounded 
worst case performance. In: INFOCOM (3). (2000) 1184-1192 

10. Srinivasan, V., Varghese, G.: Fast address lookups using controlled prefix expan- 
sion. ACM Transactions on Computer Systems (1999) 1-40 

11. loannidis, L, Grama, A., Atallah, M.: Adaptive data structures for ip lookups. In: 
INFOCOM ’03. (2003) 

12. Pellegrini, M., Fusco, G., Vecchiocattivi, G.: Adaptive stratified search trees for ip 
table lookup. Technical Report TR IIT 22/2002, Istituto di Informatica e Telem- 
atica del CNR (IIT-CNR), Pisa, Italy (2002) 

13. Degermark, M., Brodnik, A., Carlsson, S., Pink, S.: Small forwarding tables for 
fast routing lookups. In: SIGCOMM. (1997) 3-14 

14. Crescenzi, P., Dardini, L., Grossi, R.: IP address lookup made fast and simple. In: 
European Symposium on Algorithms. (1999) 65-76 



Super Scalar Sample Sort 



Peter Sanders^ and Sebastian Winkel^ 



^ Max Planck Institut fiir Informatik 
Saarbriicken, Germany, sanders@mpi-sb.mpg.de 
^ Chair for Prog. Lang, and Compiler Construction 
Saarland University, Saarbriicken, Germany, sewi@cs.uni-sb.de 



Abstract. Sample sort, a generalization of quicksort that partitions the 
input into many pieces, is known as the best practical comparison based 
sorting algorithm for distributed memory parallel computers. We show 
that sample sort is also useful on a single processor. The main algorith- 
mic insight is that element comparisons can be decoupled from expensive 
conditional branching using predicated instructions. This transformation 
facilitates optimizations like loop unrolling and software pipelining. The 
final implementation, albeit cache efficient, is limited by a linear num- 
ber of memory accesses rather than the O(nlogn) comparisons. On an 
Itanium 2 machine, we obtain a speedup of up to 2 over std : : sort from 
the GCC STL library, which is known as one of the fastest available 
quicksort implementations. 



1 Introduction 

Counting comparisons is the most common way to compare the complexity of 
comparison based sorting algorithms. Indeed, algorithms like quicksort with good 
sampling strategies [9,14] or merge sort perform close to the lower bound of 
logn! ~ nlogn comparisons^ for sorting n elements and are among the best al- 
gorithms in practice (e.g., [24,20,5]). At least for numerical keys it is a bit aston- 
ishing that comparison operations alone should be good predictors for execution 
time because comparing numbers is equivalent to subtraction — an operation of 
negligible cost compared to memory accesses. 

Indeed, at least when sorting large elements, it is well known that mem- 
ory hierarchy effects dominate performance [18]. However, for comparison based 
sorting of small elements on processors with high memory bandwidth and long 
pipelines like the Intel Pentium 4, there are almost no visible memory hierar- 
chy effects [5]. Does that mean that comparisons dominate the execution time 
on these machines? Not quite because execution time divided by nlogn gives 
about 8ns in these measurements. But in 8ns the processor can in principle exe- 
cute around 72 subtractions (24 cycles x 3 instructions per cycle). To understand 
what the other 71 slots for instruction execution are (not) doing, we apparently 
need to have a closer look at processor architecture [8]: 

^ log® stands for logj x in this paper. 



S. Albers and T. Radzik (Eds.): ESA 2004, LNCS 3221, pp. 784-796, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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Data dependencies appear when an operand of an instruction B depends on 
the result of a previous instruction A. Such instructions have to be one or more 
pipeline stages apart. 

Conditional branches disrupt the instruction stream and impede the extrac- 
tion of instruction-level parallelism. When a conditional branch enters the first 
stage of the processor’s execution pipeline, its direction must be predicted since 
the actual condition will not be known until the branch reaches one of the later 
pipeline stages. If a branch turns out to be mispredicted, the pipeline must be 
flushed as it has executed the wrong program path. The resulting penalty is 
proportional to the pipeline depth. Six cycles are lost on the Itanium 2 and 20 
on the Pentium 4 (even 31 on the later 90nm generation). This penalty could 
be mitigated by speculative execution of both branches, but this causes addi- 
tional costs and power consumption and quickly becomes hopeless when the next 
branches come into the way. 

Of course, the latter problem is relevant for sorting. To avoid mispredicted 
branches, much ingenuity has been invested by hardware architects and compiler 
writers to accurately predict the direction of branches. Unfortunately, all these 
measures are futile for sorting. For fundamental information theoretic reasons, 
the branches associated with element comparisons have a close to 50 % chance 
of going either way if the total number of comparisons is close to nlogn. 

So after discussing a lot of architectural features we are back on square one. 
Counting comparisons makes sense because comparisons in all known comparison 
based sorting algorithms are coupled to hard to predict branches. The main 
motivation for this paper was the question whether this coupling can be avoided, 
thus leading to more efficient comparison based sorting algorithms (and to a 
reopened discussion what makes a good practical sorting algorithm) . 

Section 2 presents super scalar sample sort (sss-sort) — an algorithm where 
comparisons are not coupled to branches. Sample sort [4] is a generalization of 
quicksort. It uses a random sample to And k — 1 splitters that partition the input 
into buckets of about equal size. Buckets are then sorted recursively. Using binary 
search in the sorted array of splitters, each element is placed into the correct 
bucket. Sss-sort implements sample sort differently in order to address several 
aspects of modern architectures: 

Conditional branches are avoided by placing the splitters into an implicit 
search tree analogous to the tree structure used for binary heaps. Element place- 
ment traverses this tree. It turns out that the only operation that depends on 
the result of a comparison is an index increment. Such a conditional increment 
can be compiled into a predicated instruction: an instruction that has a predi- 
cate register as an additional input and is executed if and only if the boolean 
value in this register is one. Their simplest form, conditional moves, are now 
available on all modern architectures. If a branch is replaced by these condition- 
ally executed instructions, it can no longer cause mispredictions that disrupt the 
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pipeline. Loop control overhead is largely eliminated by unrolling the innermost 
loop of log/c iterations completely^. 

Data dependencies: Elements can traverse the search tree for placement into 
buckets independently. By interleaving the corresponding instructions for several 
elements, data dependencies cannot delay the computation. Using a two-pass 
approach, sss-sort makes it easy for the compiler to implement this optimization 
(almost) automatically using further loop unrolling or software pipelining. In 
the first pass, only an “oracle” — the correct bucket number — is remembered 
and the bucket sizes are counted. In the second pass, the oracles and the bucket 
sizes are used to efficiently place the elements into preallocated subarrays. This 
two-pass approach is well known from algorithms based on radix sort, but only 
the introduction of oracles make it feasible for comparison based sorting. The 
two-pass approach has the interesting side effect that the comparisons and the 
most memory intensive components of the algorithm are cleanly separated and 
we can easily find out what dominates execution time. 

Memory Hierarchies: Sample sort is more cache efficient than quicksort be- 
cause it moves the elements only logj, n times rather than log n times. The pa- 
rameter k depends on several properties of the machine. 

The term “super scalar” in the algorithm name, as in [1], says that this algo- 
rithm enables the use of instruction parallelism by getting rid of hard to predict 
branches and data dependencies. Were it not for the alliteration, “instruction 
parallel sample sort” might have been a better name since also superpipelined, 
VLIW, and explicitly parallel instruction architectures can profit from sss-sort. 

Section 3 gives results of an implementation of sss-sort on Intel Itanium 2 
and Pentium 4 processors. The discussion in Section 4 summarizes the results 
and outlines further refinements. 



Related Work 

So much work has been published on sorting and so much folklore knowledge is 
important that it is difficult to give an accurate history of what is considered 
the “best” practical sorting algorithm. But it is fair to say that refined versions 
of Hoare’s quicksort [9,17] are still the best general purpose algorithms: Being 
comparison based, quicksort makes little assumptions on the key type. Choosing 
splitters based on random samples of size 0(i/n) [14] gives an algorithm using 
only nlogn-|- 0{n) expected comparisons. A proper implementation takes only 
0(log n) additional space, and a fallback to heapsort ensures worst case efficiency. 
In a recent study [5], STL: :sort from the GCC STL library fares best among 
all quicksort competitors. This routine stems from the HP/SGI STL library and 
implements introsort, a quicksort variant described in [17]. Only for large n and 
only on some architectures this implementation is outperformed by more cache 
efficient algorithms. Since quicksort only scans an array from both ends, it has 
perfect spatial locality even for the smallest of caches. Its only disadvantage is 

Throughout this paper, we assume that fc is a power of two. 
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Table 1. Different complexity measures for fc-way distribution and fc-way merging in 
comparison based sorting algorithms. All these algorithms need nlogfc element com- 
parisons. Lower order terms are omitted. Branches: number of hard to predict branches; 
data dep.: number of instructions that depend on another close-by instruction; I/Os: 
number of cache faults assuming the I/O model with block size B; instructions: neces- 
sary size of instruction cache. 
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that it reads and writes the entire input log ^ times before the subproblems fit 
into a cache of size M. 

Recent work on engineering sequential sorting algorithms has focused on 
more cache efficient algorithms based on divide-and-conquer that split the prob- 
lem into subproblems of size about n/k and hence only need to read and write 
the input logj. times. In the I/O model [2], with an omniscient cache of size M 
and accesses (“I/Os”) of size B, we have k = 0{M/B). In practical implementa- 
tions, k has to be reduced by a factor S7 for a-way associative hardware 

caches [15] (this restriction can in principle be circumvented [23]). On current 
machines, the size of the Translation Lookaside Buffer (TLB) is the more strin- 
gent restriction. This translation table keeps the physical address of the most 
recently accessed pages of virtual memory. Access to other pages causes TLB 
misses. TLBs are small (usually between 64 and 256) and for large n, TLB misses 
can get much more expensive than cache misses. 

k-way Merging is a simple deterministic approach to fc-way divide-and-con- 
quer sorting and is the most popular approach to cache efficient comparison 
based sorting [18,13,22,24,20,5]. Using a tournament tree data structure [12], 
the smallest remaining element from k sorted data streams can be selected using 
log k comparisons. Keeping this tournament tree in registers reduces the number 
of memory accesses by a factor log k at the cost of 2k registers and a cumbersome 
case distinction in the inner loop [22,20,24]. Although register based multi-way 
merging severely restricts the maximal fc, this approach can be a good compro- 
mise for many architectures [20,24]. This is another indication that most sorting 
algorithms are less restricted by the memory hierarchy than by data dependen- 
cies and delays for branches. Register based k'-way mergers can be arranged into 
a cache efficient k-way funnel merger by coupling log ^ levels using buffer arrays 
[5]. By appropriately choosing the sizes of the buffer arrays, this algorithm be- 
comes cache-oblivious, i.e., it works without knowing the I/O-model parameters 
M and B and is thus efficient on all levels of the memory hierarchy [7]. 
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Function sampleSort(e = (ei, . . . , Cn), k) 

if n/k is “small” then retnrn smallSort{e) j j base case, e.g. quicksort 

lei {S\, . . . , Sak-i) denote a random sample of e 

sort S // or at least locate the elements whose rank is a multiple of a 

(so, Si, S2, . . . , Sk-i,Sk}:= (-00, Sa, S2a, ■■■ , S(k-i)a, oo) // determine splitters 

for i := 1 to n do 

find j G {1, ... ,k} such that Sj-i < d < Sj 
place €i in bucket bj 

return concatenate(sampleSort(b\) , . . . , sampleSortipk)) 

Fig. 1. Pseudocode for sample sort with fc-way partitioning and oversampling factor a. 



For integer keys, radix sort with log k bit digits starting with the most signif- 
icant digit (MSD) can be very fast. This is probably folklore. We single out [1] 
because it contains the first mention of TLB as an important factor in choosing 
k and because it starts with the sentence “The compare and branch sequences 
required in a traditional sort algorithm cannot efficiently exploit multiple execu- 
tion units present in currently available RISC processors.” However, radix sort 
in general and this implementation in particular depend heavily on the length 
and distribution of input keys. In contrast, sss-sort does away with compare and 
branch sequences without assumptions on the input keys. For a recent overview 
of radix sort implementations refer to [19,11]. Sss-sort owes a lot to MSD radix 
sort because it is also distribution based and since it adopts the two-pass ap- 
proach of a counting phase followed by a distribution phase. 

One goal of this paper is to give an example of how it can be algorithmically 
interesting and relevant for performance to consider complexity measures beyond 
the RAM model and memory hierarchies. Table 1 summarizes the complexity 
of fc-way distribution and k-'w&y merging for five different algorithms and five 
different complexity measures (six if we count comparisons) . For the sake of this 
comparison, log k recursion levels of quicksort are viewed as a means of k-'w&y 
distribution. We can see that sss-sort outperforms the other algorithms with 
respect to the conditional branches and data dependencies. It also fares very 
well regarding the other measures. The constant factor in the number of I/Os 
might be improvable. Refer to Section 4 for a short discussion. 



2 Super Scalar Sample Sort 

Our starting point is ordinary sample sort. Fig. 1 gives high level pseudocode. 
Small inputs are sorted using some other algorithm like quicksort. For larger 
inputs, we first take a sample oi s = ak randomly chosen elements. The over- 
sampling factor a allows a flexible tradeoff between the overhead for handling 
the sample and the accuracy of splitting. In the full paper we give a heuristic 
derivation of a good oversampling factor. Our splitters are those elements whose 
rank in the sample is a multiple of a. Now each input element is located in 
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the splitters and placed into the corresponding bucket. The buckets are sorted 
recursively and their concatenation is the sorted output. 

Sss-sort is an implementation strategy for the basic sample sort algorithm. 
We describe one simple variant here and refer to Section 4 for a discussion of some 
possible refinements. All sequences are represented as arrays. More precisely, we 
need two arrays of size n. One for the original input and one for temporary 
storage. The flow of data between these two arrays alternates in different levels 
of recursion. If the number of recursion levels is odd, a final copy operation makes 
sure that the output is in the same place as the input. Using an array of size n 
to accommodate all buckets means that we need to know exactly how big each 
bucket is. In radix sort implementations this is done by locating each element 
twice. But this would be prohibitive in a comparison based algorithm. Therefore 
we use an additional auxiliary array, o, of n oracles - o{i) stores the bucket 
index for e,. A first pass computes the oracles and the bucket sizes. A second 
pass reads the elements again and places element into bucket 6o(z)- This two 
pass approach incurs costs in space and time. However these costs are rather 
small since bytes suffice for the oracles and the additional memory accesses are 
sequential and thus can almost completely be hidden via software or hardware 
prefetching [8,10]. In exchange we get simplified memory management, no need 
to test for bucket overflows. Perhaps more importantly, decoupling the expensive 
tasks of finding buckets and distributing elements to buckets facilitates software 
pipelining [3] by the compiler and prevents cache interferences of the two parts. 
This optimization is also known as loop distribution [16,6]. 

Theoretically the most expensive and algorithmically the most interesting 
part is how to locate elements with respect to the splitters. Fig. 2 gives pseu- 
docode and a picture for this part. Assume k is a, power of two. The splitters are 
placed into an array t such that they form a complete binary search tree with 
root ti = Sfc/ 2 . The left successor of tj is stored at t 2 j and the right successor is 
stored at t 2 j+i- This is the arrangement well known from binary heaps but used 
for representing a search tree here. To locate an element a^, it suffices to travel 
down this tree, multiplying the index j by two in each level and adding one if the 
element is larger than the current splitter. This increment is the only instruction 
that depends on the outcome of the comparison. Some architectures like IA-64 
have predicated arithmetic instructions that are only executed if the previously 
computed condition code in the instruction’s predicate register is set. Others at 
least have a conditional move so that we can compute j\= 2j and then, specu- 
latively, j':= j + 1. Then we conditionally move j' to j. The difference between 
such predicated instructions and ordinary branches is that they do not affect the 
instruction flow and hence cannot suffer from branch mispredictions. 

When the search has traveled down to the bottom of the tree, the index j lies 
between k and 2fc — 1 so that subtracting k—1 yields the bucket index o{i). Note 
that each iteration of the inner loop needs only four or five machine instructions 
so that when unrolling it completely, even for log fc = 8 we still have only a 
small number of instructions. We also need only three registers for maintaining 
the state of the search (a^, tj, and j). Since modern processors have many more 
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i:— {Sk/2, Sfc/4, S3J./4, Sfc/8, Ssfc/Si ^5k/8y ^7k/8, ■ • ■ ) 
for i := 1 to n do //locate each element 
j:~ 1 H current tree node := root 
repeat log k times jj will be unrolled 
j\— 2j + (oi > tj) //left or right? 
j~ j — k -\- 1 //bucket index 
l^jl++ II count bucket size 
o(i)'-= j H remember oracle 

Fig. 2. Finding buckets using implicit search trees. The picture is for fc = 8. We adopt 
the C convention that “a; > y” is one if a: > y holds, and zero else. 

physical renaming registers ^ , we can afford to run several iterations of the outer 
loop in an interleaved fashion (via unrolling or software pipelining) in order to 
overlap their execution. 

With the oracles it is very easy to distribute the input elements in array 
a to an output array a' that stores all the buckets consecutively and without 
any gaps. Assume that B[j] is initialized to J2i<j 1^*1- Then the distribution is 
a one-liner: 

for i := 1 to n do := a, // (*) 

This is quite fast in practice because a and o are read linearly, which enables 
prefetching. Writing to a' happens only on k distinct positions so that caching 
is effective if k is not too big and if all currently accessed pages fit into the TLB . 
Data dependencies only show up for the accesses to B. But this array is very 
small and hence fits in the first-level cache. 




3 Experiments 

We have implemented sss-sort on an Intel server running Red Hat Enterprise 
Linux AS 2.1 with four 1.4 GHz Itanium 2 processors and 1 GByte of RAM. 
We used Intel’s G-l— I- compiler v8.0 (Build 20031017) [6]. We have chosen this 
platform because the hardware and, more importantly, the compiler have good 
support for the two main architectural features we exploit: predicated instruc- 
tions and software pipelining. We would like to stress, however, that the simple 
type of predicated instructions we need - the conditional moves - are supported 
by all modern processor architectures like the Intel Pentium H/HI/4, the AMD 
Athlon and Opteron, Alpha, PowerPG and the IBM Power3/4/5 processors. 
Hence, with an appropriate compiler or manual coding, our algorithm should 

® For instance, the Pentium 4 has 126 physical registers. The eight architected registers 
of IA-32 are internally mapped to these physical registers via renaming, allowing to 
resolve all false dependencies which are due to the limited number of architected 
registers [8]. In our case, this means that different iterations of the outer loop can 
be mapped to different registers, allowing them to be executed in parallel. 
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Fig. 3. Breakdown of the execution time of sss-sort (divided by nlogn) into phases. 
“FB” denotes the finding of buckets for the elements, “DIST” the distribution of the 
elements to the buckets, “BA” the base sorting routines. The remaining time is spent 
in finding the splitters etc. 



work well on most machines (like on the Pentium 4, as demonstrated at the end 
of this section) . 

We have applied the restrict keyword from the ANSI/ISO C standard 
C99 several times to communicate to the compiler that our different arrays are 
not overlapped and not accessed through any other aliased pointer. This was 
necessary to enable the compiler to software pipeline the two major critical 
loops (“find the buckets” from Fig. 2, abbreviated “FB” in the following, and 
“distribute to buckets” ((*) in Sec. 2), abbr. “DIST”). 

Prefetch instructions and predication are utilized automatically by the com- 
piler as intended in Section 2. The used compiler flags are “-03 -prof_use 
-restrict”, where “-prof_use“ indicates that we have performed profiling runs 
(separately for each tested sorting algorithm). Profiling gives a 9% speedup for 
sss-sort. Our implementation uses k = 256 to allow byte-size oracles, sss-sort 
calls itself recursively to sort buckets of size larger than 1000 (cf. Fig. 1). It 
uses quicksort between 100 and 1000, insertion sort between 5 and 100, and 
customized straight-line code for n < 5. 

Below we report on experiments for 32 bit random integers in the range 
[0, 10®] . We have not varied the input distribution since sample sort using random 
sampling is largely independent of the input distribution.^ The figures have the 
execution time divided by nlogn for the y axis, i.e., we give something like “time 

^ Our current simple implementation would suffer in presence of many identical keys. 
However, this could be fixed without much overhead: If Si_i < Si = Si+i = • • • = Sj, 
j > i, change Si to Si — 1. Do not recurse on buckets . . . ,bj - they all contain 
identical keys. 
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Fig. 4. Execution time on the Itanium, divided by n log n, for the Intel implementation 
of STL: :sort, the corresponding GCC implementation and for sss-sort. The measure- 
ments were repeated sufficiently often to require at least 10 seconds for all algorithms. 



per comparison”. For a 0(nlogn) algorithm one would expect a flat line in the 
traditional RAM model of computation. Deviations from this expectations are 
thus signals for architectural effects. 

Fig. 3 provides a breakdown of the execution time of sss-sort into different 
phases. It is remarkable that the central phases FB and DIST account only for 
a minor proportion of the total execution time. The reason is that software 
pipelining turns out to be highly effective here: It reduces the schedule lengths 
of the loops FB and DIST from 60 and 6 to 11 and 3 cycles, respectively (this 
can be seen from the assembly code on this statically scheduled architecture). 
Furthermore, we have measured that for relatively small inputs (n < 2^®), it 
takes on the average only 11.7 cycles to execute the 63 instructions in the loop 
body of FB, and 4 cycles to execute the 14 instructions in that of DIST. This 
yields dynamic IPC (instructions per clock) rates of 5.4 and 3.5, respectively; 
the former is not far from the maximum of 6 on this architecture. 

The IPC rate of FB decreases only slightly as n grows and is still high with 
4.5 at n = 2^®. In contrast, the IPC rate of DIST begins to decrease when the 
memory footprint of the algorithm (approx. 4n-|-4n-|-n=9n bytes) approaches the 
L3 cache size of 4 MB (this can also be seen in Fig. 3, starting at n = 2^^). Then 
the stores in DIST write through the caches to the main memory and experience 
a significant latency, but on the Itanium 2, not more than 54 store requests 
can be queued throughout the memory hierarchy to cover this latency [10,21]: 
Consequently, the stall times during the execution of DIST increase (also those 
due to TLB misses). The IPC of its loop body drops to 0.8 at n = 2^®. 
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Fig. 5. The execution time spent in the different recursion levels of sss-sort. The total 
execution time (> 0) shows small distortions compared to Fig. 3, due to the measure- 
ment overhead. 



However, Fig. 3 also shows — together with Fig. 5 — that at the same time 
the first recursion level sets in and reduces the time needed for the base cases, 
thus ameliorating this increase. 

Fig. 4 compares the timing for our algorithm with two previous quicksort 
implementations. The first one is std: :sort from the Dinkumware STL library 
delivered with Intel’s C-I-+ compiler v8.0. The second one is std: : sort from the 
STL library included in GCC v3.3.2. The latter routine is much faster than the 
Dinkumware implementation and fared extremely well in a recent comparison 
of the best sorting algorithms [5]: There it remained unbeaten on the Pentium 
4 and was outperformed on the Itanium 2 only by funnelsort for n > 2^"^ (by a 
factor of up to 1.18). 

As Fig. 4 shows, our sss-sort beats its closest competitor, GCC STL, by at 
least one third; the gain over GCC STL grows with n and reaches more than 
100%. The flatness of the sss-sort curve in the figure (compared to GCC STL) 
demonstrates the high cache efficiency of our algorithm. 

We have repeated these experiments on a 2.66 GHz Pentium 4 (512 KB L2 
cache) machine running NetBSD 1.6.2, using the same compiler in the IA-32 ver- 
sion (Build 20031016). Since the latter does not perform software pipelining, we 
have unrolled each of the loops FB and DIST twice to expose the available paral- 
lelism. We have added the flag “-xN” to enable optimizations for the Pentium 4, 
especially the conditional moves. These were applied in FB as intended, except 
for four move instructions, which were still dependent on conditional branches. 
These moves were turned into conditional moves manually in order to obtain a 
branchless loop body. 
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Fig. 6. The same comparison as in Figure 4 on a Pentium 4 processor. 



The breakdown of the execution time, available in the full paper, shows that 
FB and DIST are here more expensive than on the Itanium. One reason for this is 
the lower parallelism of the Pentium 4 (the maximal sustainable IPC is about 2; 
we measure 1.1-1. 8 IPC for the loop body of FB). Nevertheless, sss-sort manages 
to outperform GCC STL for most input sizes by 15-20% (Figure 6). The drop 
at n = 2^^ could probably be alleviated by adjusting the splitter number k — 1 
and the base sorting routines. 

4 Discussion 

Sss-sort is a comparison based sorting algorithm whose O(nlogn) term contains 
only very simple operations without unnecessary conditional branches or short 
range data dependencies on the critical path. The result is that the cost for 
this part on modern processors is dwarfed by “linear” terms. This observation 
underlines that algorithm performance engineering should pay more attention 
to technological properties of machines like parallelism, data dependencies, and 
memory hierarchies. We have to take these aspects into account even if they show 
up in lower order terms that contribute a logarithmic factor less instructions than 
the “inner loop”. 

Apart from these theoretical considerations, sss-sort is a candidate for being 
the best comparison based sorting algorithm for large data sets. Having made 
this claim, it is natural to discuss remaining weaknesses: So far we have found 
only one architecture with a sufficiently sophisticated compiler to fully harvest 
the potential advantages of sss-sort. Perhaps sss-sort can serve as a motivating 
example for compiler writers to have a closer look at exploiting predication. 
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Another disadvantage compared to quicksort is that sss-sort is not inplace. 
One could make it almost inplace however. This is most easy to explain for 
the case that both input and output are a sequence of blocks, holding c ele- 
ments each for some appropriate parameter c. The required pointers cost space 
0{njc). Sampling takes sublinear space and time. Distribution needs at most 2fc 
additional blocks and can otherwise recycle freed blocks of the input sequence. 
Although software pipelining may be more difficult for this distribution loop, the 
block representation facilitates a single pass implementation without the time 
and space overhead for oracles so that good performance may be possible. Since 
it is possible to convert inplace between block list representation and an array 
representation in linear time, one could actually attempt an almost inplace im- 
plementation of sss-sort.® If c is sufficiently big, the conversion should not be 
much more expensive than copying the entire input once, i.e., we are talking 
about conversion speeds of GBytes per second. 

Algorithms that are conspicuously absent from Tab. 1 are distribution based 
counterparts of register based merging and funnel- merging. Register based dis- 
tribution, i.e., keeping the search tree in registers would be possible but it would 
reintroduce expensive conditional branches and only gain some rather cheap 
in-cache memory accesses. On the other hand, cache-oblivious or multi-level 
cache-aware fc-way distribution might be interesting (see also [7]). An interest- 
ing feature of sss-sort in this respect is that the two-phase approach allows us 
to precompute the distribution information for a rather large k (in particular 
larger than the TLB) and then implement the actual distribution exploiting sev- 
eral levels of the memory hierarchy. Let us consider an example. Assume we have 
2^® bytes of L3 cache and want to distribute with k = 2^^. Then we distribute 
2^® byte batches of data to a temporary array. This array will be in cache and its 
page addresses will be in the TLB. After a batch is distributed, each bucket is 
moved to its final destination in main memory. The cost for TLB misses will now 
be amortized over an average bucket size of 2® bytes. A useful side effect is that 
cache misses due to the limited associativity of hardware caches are eliminated 
[23,15]. 
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Abstract. Spanners are sparse subgraphs that preserve distances up to 
a given factor in the underlying graph. Recently spanners have found 
important practical applications in metric space searching and message 
distribution in networks. These applications use some variant of the so- 
called greedy algorithm for constructing the spanner — an algorithm that 
mimics Kruskal’s minimum spanning tree algorithm. Greedy spanners 
have nice theoretical properties, but their practical performance with 
respect to total weight is unknown. In this paper we give an exact al- 
gorithm for constructing minimum-weight spanners in arbitrary graphs. 
By using the solutions (and lower bounds) from this algorithm, we ex- 
perimentally evaluate the performance of the greedy algorithm for a set 
of realistic problem instances. 



1 Introduction 

Let G = {V,E) be an undirected and edge- weighted graph. A t-spanner in G is 
a subgraph G' = {V, E') of G such that the shortest path between any pair of 
nodes u,v £ U is at most t times longer in G' than in G (where t > 1) [22]. 
The NP-hard minimum-weight t-spanner problem (MWSP) is to construct a 
t-spanner with minimum total weight [3] . 

Spanners — and algorithms for their construction — form important building 
blocks in the design of efficient algorithms for geometric problems [10, 14, 19, 23]. 
Also, low-weight spanners have recently found interesting practical applications 
in areas such as metric space searching [21] and broadcasting in communication 
networks [11]. A spanner can be used as a compact data structure for holding 
information about (approximate) distances between pairs of objects in a large 
metric space, say, a collection of electronic documents; by using a spanner instead 
of a full distance matrix, significant space reductions can be obtained when using 
search algorithms like AESA [21]. For message distribution in networks, spanners 
can simultaneously offer both low cost and low delay when compared to existing 
alternatives such as minimum spanning trees (MSTs) and shortest path trees. 
Experiments with constructing spanners for realistic communication networks 
show that spanners can achieve a cost that is close to the cost of a MST while 
significantly reducing delay (or shortest paths between pairs of nodes) [11]. 

The classical algorithm for constructing a low-weight t-spanner for a graph 
G = (F, E) is the greedy algorithm [1]. All practical applications of spanners use 
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some variant of this algorithm. The greedy algorithm first constructs a graph 
G' = {V,E') where E' = %. Then the set of edges E is sorted by non-decreasing 
weight and the edges are processed one by one. An edge e = (it, v) is appended 
to E' if the weight of a shortest path in G' between u and v exceeds t ■ Cg, 
where Ce is the weight of edge e. (Note that the greedy algorithm is clearly a 
polynomial-time algorithm; for large values of t the algorithm is identical to 
Kruskal’s algorithm for constructing MSTs.) 

The output of the greedy algorithm is a so-called greedy spanner. A greedy 
spanner has several nice theoretical properties, which are summarized in Sec- 
tion 2. However, it is not known how well the greedy algorithm actually performs 
in practice, i.e., how well the greedy spanner approximates the minimum-weight 
spanner for realistic problem instances. 

In this paper we present the first exact algorithm for MWSP. We give an 
integer programming formulation based on column generation (Section 3). The 
problem decomposes into a master problem which becomes a set covering like 
problem and a number of subproblems which are constrained shortest path prob- 
lems. The integer program is solved by branch-and-bound with lower bounds 
obtained by linear programming relaxations. 

We have performed experiments with the new exact algorithm on a range of 
problem instances (Section 4). Our focus has been on comparing greedy span- 
ners with minimum- weight spanners (or lower bounds on their weights). The 
results show that the greedy spanner is a surprisingly good approximation to a 
minimum-weight spanner — on average within 5% from optimum for the set of 
problem instances considered. Our conclusions and suggestions for further work 
are given in Section 5. 



2 Greedy Spanner Algorithm 

Let n = \V\ and m = \E\ for our given graph G = {V,E). For general graphs 
it is hard to approximate MWSP within a factor of i7(logn) [9]. However, for 
special types of graphs much better approximations can be obtained. Most of 
these are obtained by (variants of) the greedy algorithm. 

The performance of the greedy algorithm depends on the weight function on 
the edges of G. For general graphs, Althofer et al. [1] showed that the greedy 
algorithm produces a graph with edges; Chandra et al. [5] showed 

that the weight is bounded by times that of a minimum spanning 

tree (MST) for G (for any e > 0). These bounds improve to n{l + 0{l/t)) and 1-1- 
0(l/t), respectively, when G is planar. When the nodes of G correspond to points 
in some fixed dimensional space (and edge weights are equal to the Euclidean 
distance between the nodes), the number of edges in the greedy spanner is 0{n) 
and the weight 0(1) times that of a MST; the constant implicit in the 0-notation 
depends on t and the dimension of the space [6, 7]. 

Although the bounds for the greedy algorithm are promising for planar and 
geometric graphs, to the best of our knowledge, little is known about the tight- 
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ness of these bounds and on the practical performance of the greedy algorithm 
on realistic problem instances. 

A naive implementation of the greedy algorithm runs in O(n^logn) time. 
Significant improvements have been made, both from the theoretical and practi- 
cal side, on this running time. Gudmundsson et al. [13] gave a O(nlogn) variant 
of the greedy algorithm for points in fixed dimension. Navarro and Paredes [20] 
gave several practical variants of the greedy algorithm for general graphs with 
running times down to O (nm log m). They were able to compute approximative 
greedy spanners for graphs with several thousand nodes in a few minutes. 

Finally, we note that a greedy spanner will always contain a MST for G [22]. 
On the contrary, a minimum-weight spanner need not contain a MST for the 
graph. An example is shown in Figure 1. 



(a) Original complete graph (edges 
omitted) 




(c) Minimum weight spanner 
{t = 2.5) 




(b) MST of original graph 




spanner 



Fig. 1. An example showing that a minimum-weight spanner need not contain a MST 
of the original graph. 
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3 Column Generation Approach 

In the new exact algorithm we consider the following generalization of MWSP. 
We are given a connected undirected graph G = (V,E) with edge weights Cg, 
e € if, a set of node pairs K = {{ui, Ui)}, i = 1,. . . ,k, and a stretch factor t > 1. 
Let PuiVi denote a shortest path in G between Ui and vt, (ui,Vi) € K, and let 
c(puiVi) denote the length of Puiv,- The (generalized) minimum-weight spanner 
problem (MWSP) consists of finding a minimum cost subset of edges E' C E 
such that the shortest paths i = 1, . . . , fc, in the new graph G' = (P, E') 

are no longer than t ■ Note that this problem reduces to the normal 

MWSP when K =V xV.' ' 

Obviously, if t is sufficiently large, an optimal solution to the MWSP will 
be a minimal spanning forest. On the other hand, if t is close to 1, an optimal 
solution will include all edges on the shortest paths Pmvi of G. 

Our exact algorithm for solving MWSP is based on modeling the problem as 
an integer program, and solving this integer program by branch and bound using 
bounds from linear programming relaxations. MWSP can be modeled in several 
different ways, but the shortest path constraints are particularly difficult to 
model efficiently. Therefore, we have chosen a model in which paths are decision 
variables. 

Given an instance of the MWSP, let Puv denote the set of paths between 
u and V with length smaller than or equal to t ■ c(p„„), V(u, u) € K, and let 
P = U{u v)eK Furthermore, let the indicator variables 6^ be defined as 
follows: Sp = 1, if edge e € if is on path p G P and (5® = 0 otherwise. Let 
the decision variables Xe, e G if, equal 1 if edge e is part of the solution and 0 
otherwise, and let decision variables j/p, p € Puv, equal 1 if path p connects u and 
V in the solution and 0 otherwise, \/{u,v) € K. Then MWSP can be formulated 
as follows: 



minimize CeXe 

e&E 




(1.1) 


subject to ^ ppSp < Xe 

P&Puv 


Ve G if,V(u, u) G K 


(1.2) 


Vp>^ 

P&Puv 


V(m, v) G K 


(1.3) 


Xe G {0,1} 


WeGE 


(1.4) 


Vp G {0,1} 


'ip G P 


(1.5) 



The objective function (1.1) minimizes the total cost of the edges in the spanner. 
Constraints (1.2) require that for given a pair of nodes (u,v), all edges on the 
selected path connecting u and v must be part of the spanner. Constraints (1.3) 
say that every pair of nodes must be connected by at least one path. 

We will solve the model with a branch and bound algorithm using the linear 
relaxation lower bound, which is obtained from (1.1)-(1.5) by relaxing the con- 
straints (1.4) and (1.5). This model contains \E\ -|- |P| variables and (|if| -I- f)\K\ 
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constraints. Since the number of variables may be exponential in the input size, 
we will not solve the relaxation of (1.1)-(1.5) directly, in fact we will not even 
write up the model. Instead, we will solve the restricted master problem (RMP): 

minimize E CeXe 

e&E 

subject to E VpSp < Xe 
p^PLv 

p^PLv 

Xe>Q 

2/p > 0 

where C Puv,PL ^ v) ^ K and P' = U(u.«)Gif ^'nv To begin with, it 

is sufficient that contains only one path for every (it, v) € K. Iteratively, we 
will add paths to P' , extending the RMP. This procedure is known as delayed 
column generation. We use the Simplex method to solve the RMP in every 
iteration of the delayed column generation, which provides us with an optimal 
dual vector of the RMP. Now, consider the dual linear program of the full primal 
problem with dual variables tt™ and cr„„, y{u,v) € K, Ve G E: 

maximize 

{u^v)£K 

subject to E 7T™ < Ce 

{u,v)£K 

— 'y ' + CTut) < 0 

eeE 

7 T ™>0 \/eG E,y{u,v) G K 

Ouv > 0 V(m, v) G K 

If the dual values obtained by solving the RMP constitute a feasible solution 
to (3), the weak duality theorem tells us that dual solution is an optimal solution 
to (3), which implies that the optimal solution to the RMP is also an optimal 
solution to the full problem. Otherwise the dual solution to the RMP violates 
one of the constraints — J2e&E *^p^™ + < 0 of (3) for some p G P. The 

amount of violation is called the reduced cost of p G P with respect to tt and cr, 
denoted 

e^E 

If Cp’^ > 0, Vp € P the dual vector constitutes a feasible solution to (3), which 
in turn means that the current primal solution to the RMP is an optimal solu- 



Ve G P 

(3) 

Vp G Puv,y{u,v) G K 



Ve G Ey{u, v) G K 

V(u,u) G K 

\JeGE 
VpG P' 
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tion for the full master problem. In this case we may stop the delayed column 
generation procedure. Otherwise we add a path p with Cp’”^ < 0 to the RMP, 
which corresponds to adding a violated dual constraint to the dual of the RMP. 
Hence, in every iteration of the delayed column generation procedure we solve 
problem minpgpCp’'^ = minpgp JpTr™ — Ouv, which is called the pricing 

problem. For a given (u, v) € K, Guv is a constant and hence the pricing problem 
is the problem of finding a shortest path p £ Puv between u and v with edge 
weights TT™. This problem is recognized as the constrained shortest path problem 
(CSPP), since we require that p £ P^v, i-e., that the length of p is no longer 
than t times the length of the shortest path between u and v with respect to the 
original edge costs. 

3.1 Constrained Shortest Path Problem 

The (resource) constrained shortest path problem (CSPP) is a generalization 
of the classic shortest path problem, where we impose a number of additional 
resource constraints. Even in the case of a single resource constraint the problem 
is weakly NP-complete [12]. The problem is formally stated as follows: Given a 
graph G = (P, E) with a weight We>Q and a cost Ce > 0 on every edge e € E, 
a source s G V and a target d G V and a resource limit B, find a shortest path 
from s to d with respect to the cost of the edges, satisfying that the sum of the 
weights on the path is at most B. 

Several algorithms have been proposed for its solution (see e.g. [28] for a 
survey). Algorithms based on dynamic programming were presented in [2, 17]. 
A labelling algorithm that exploits the dominance relation between paths (i.e. 
path p dominates path q if it has cost Cp < Cq and weight Wp < Wq) was proposed 
in Desrochers and Soumis [8]. Lagrangean relaxation has been used successfully 
to move the difficult resource constraint into the objective function in a two phase 
method, first solving the Lagrangean relaxation problem and then closing the 
duality gap [15, 28] . The CSPP can be approximated to any degree by polynomial 
algorithms by the fully polynomial time approximation seheme (FPTAS) given 
by Warburton [26]. 

We use a labelling algorithm similar to Desrochers and Soumis [8] with the 
addition of a method of early termination of infeasible paths. Our algorithm 
grows paths starting from the source s until we reach the target d. In the algo- 
rithm we represent a path pi by a label (pi,ni,Wi,Ci), stating that the path pi 
ending in node nt has total weight Wi and total cost c^. If two paths pi,P 2 end 
in the same node, pi dominates P 2 if Wi < W 2 and Ci < C 2 . We may discard all 
dominated labels since they cannot be extended to optimal paths. 

In our algorithm, we store the labels (paths) in a heap El ordered by the cost 
of the path. Iteratively, we pick the cheapest path from the heap and extend it 
along all edges incident to its endpoint. If the extended path does not violate 
the maximum weight constraint, we create a new label for this path. We remove 
all dominated labels, both labels generated earlier that are dominated by the 
new labels and new labels which are dominated by labels generated earlier. This 
is done efficiently by maintaining a sorted list for every node n consisting of all 
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labels for node n. When creating a new label, we only need to run through this 
list to remove dominated labels. When we reach the target node d, we know we 
have created the cheapest path satisfying the maximum weight constraint since 
we pick the cheapest label in every iteration. 

The algorithm above is the standard labelling algorithm. Before we start the 
labelling algorithm we do two shortest path computations, creating a shortest 
path tree from d with respect to edge costs Ce and a shortest path tree T™ 
from d with respect to edge weights We- Whenever we extend a path to a node n, 
we check if the new label (p, n, w, c) satisfies w + T'^{n) < B and c+T^{n) < M, 
where M is the maximum cost constraint imposed on the problem (note that 
we are only interested in finding paths with reduced cost Cp’°’ < 0). If one of the 
inequalities is not satisfied we can discard this path, since it will not be able to 
reach d within the maximum weight and cost limits. The algorithm is sketched 
below. 

CSPP(G, w, c, B, M, s, d) 

1. Compute shortest path trees T™ and T'^. 

2 . 

3. Pick a cheapest label (p,n,w,c) in H. li n = d then we are done, p is the 
cheapest path satisfying the maximum weight constraint. 

4. For all nodes rii adjacent to n by edge e^, extend the path and create new 

labels {p U {ei},rii,w + + CgJ. Discard new labels where w + + 

T'^{ni) > B or c+ Ce^+ T'^{ni) > M. Discard dominated labels. Goto 3. 

In every iteration of the column generation procedure, we run the CSPP 
algorithm for every pair of nodes {u, v) with edge costs tt™ and edges weights 
Ce, maximum cost limit M = 0 ’“’' and maximum weight limit B = t ■ c(p„„). 
Note that it is not possible to run an “all constrained shortest path” algorithm, 
since the edge costs tt™ differ for every pair of nodes. 

3.2 Implementation Details 

Delayed column generation allows us to work on a LP which only contains a small 
number of variables. However, our restricted master problem (2) still contains a 
large number of constraints (for complete graphs there will be 0{n'^) constraints) . 
The constraints are all needed to ensure primal feasibility of the model, but in 
practice only a small number of the constraints will be active. We will remove 
the constraints of the form from the RMP to reduce the size 

of the master LP. This will allow infeasible solutions, but we will check if any 
violated constraints exist and add these constraints to the model. 

Checking whether a constraint is violated is straightforward: for all node 
pairs (tt, v) € K and all edges e € E, we compute the sum X^peP„„ Vp^p 
check whether this quantity is greater than Xg. If so, we add the constraint 
SpeP „ dp^p — model. Computational experiments show that we only 

need to add a small fraction of the constraints to the model. This greatly speeds 
up the Simplex iterations in the algorithm. 
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Branching is done by selecting an edge variable Xe with maximum value in 
the fractional solution and demanding Xe = 1 on one branch and Xe = 0 on 
the other. On the Xe = 0 branch the pricing problem changes a little bit, since 
edge e is not allowed in any of the paths generated. This can be handled very 
efficiently by deleting the edge from the graph on which the pricing algorithm 
runs. The Xe = ^ branch does not change the pricing problem. We evaluate the 
branch and bound nodes in depth- first order. 

We have used the general branch-and-cut framework ABACUS [25] with 
CPLEX [16] in the implementation of our algorithm. 

4 Experimental Results 

One of the main contributions of this paper is to evaluate the quality of the 
greedy spanner algorithm on a set of relevant problem instances. We have created 
two types of graphs, denoted Euclidean graphs and realistic graphs, respectively. 

The Euclidean graphs are generated as follows. The vertices correspond to 
points that are randomly and uniformly distributed in a /c-dimensional hy- 
percube. Then a complete Euclidean graph is constructed from the set of 
points. Graphs classes for all combinations of dimensions fc = 5, 10, 20, 25 and 
n = 20, 30, 40, 50 have been generated. For every class of graphs, we have created 
50 instances, which we have solved by both the greedy spanner algorithm and 
our exact method with stretch factor t = 1.2, 1.4, 1.6, 1.8. (Note that this type 
of graphs is identical to the type of graphs analyzed in [4] ) . 

The realistic graphs originate from the application in broadcasting and mul- 
ticasting in communications networks, where spanners may be used to reduce 
the cost of sending data while preserving low delay. We have created a set of 
instances following the design proposed in [11, 27]. For n = 16,32,64 we place 
n nodes uniformly random in a square. We add edges according to one of two 
different schemes: 

1. Place edges completely at random where the probability of adding an edge 
is the same for all pairs of nodes. 

2. Favour local edges so that the probability of adding an edge between u and 

V is , where d is the distance between u and v and L is the diameter 

of the square. 

By varying the probability in method 1, and a and (3 in method 2, we generate 
graphs with average node degree D = A and D = 8, respectively. To model 
realistic networks, we keep /3 = 0.14 fixed and set a to get the desired average 
node degree for n = 16,32,64 [11]. 

We assign a cost to every edge in the graph according to one of three methods: 

1. Unit cost. All edges have cost 1. 

2. Euclidean distance. All edges have cost equal to the Euclidean distance be- 
tween the endpoints. 

3. Random cost. Edge cost is assigned at random from the set {1, 2, 4, 8, 16}. 
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Table 1. Average excess from optimum of greedy spanner (in percent) for Euclidean 
graphs. 



Nodes 






20 






30 








40 










50 








Dimension 


5 




10 


20 


25 


5 


10 


20 




25 


5 


10 




20 


25 


5 


10 


20 




25 


t = 1.2 


0.90 


0 


00 


0.00 


0.00 


1.63 


0.00 


0.00 


0 


00 


2.50 


0.01 


0 


00 


0.00 


3.10 


0.01 


0.00 


0 


00 


t = 1.4 


7.61 


0 


83 


0.00 


0.00 


10.88 


1.20 


0.01 


0 


00 


13.00 


1.43 


0 


00 


0.00 


13.59 


1.75 


0.01 


0 


00 


t = 1.6 


16.22 


9 


55 


0.62 


0.31 


20.11 


13.36 


1.00 


0 


29 


- 


15.39 




38 


0.40 


. 


18.44 


1.50 


0 


41 


t = 1.8 


21.99 


33 


03 


18.45 


13.68 


- 


47.93 


32.97 


17 


97 


- 


- 




- 


- 


- 


- 


- 




- 



Table 2. Average excess from optimum of greedy spanner (in percent) for realistic 
graphs. Results for locality and + locality refer to graphs where edges are added 
according to edge addition scheme 1 and 2, respectively, as explained in the text. 



Edge cost 






Un 


t 










End 


dean 










Ran 


dom 






Avg. degree 




4 






8 






4 






8 






4 






8 




# Nodes 


16 


32 


64 


16 


32 


64 


16 


32 


64 


16 


32 


64 


16 


32 


64 


16 


32 


64 


t = 1.1 






































-r locality 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.05 


0.02 


0.02 


0.57 


0.23 


0.14 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


+ locality 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.41 


0.12 


0.09 


1.43 


1.40 


1.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


t = 2.0 






































-r locality 


7.63 


5.35 


- 


- 


- 


- 


3.25 


2.22 


1.29 


4.71 


8.48 


4.35 


2.76 


1.17 


2.46 


4.22 


3.29 


3.33 


+ locality 


16.84 


15.30 


- 


- 


- 


- 


3.84 


2.73 


2.32 


4.85 


- 


- 


1.74 


0.90 


1.67 


5.74 


4.34 


3.71 


t = 4.0 






































-r locality 


13.45 


- 


- 


- 


- 


- 


1.31 


6.29 


- 


2.10 


- 


- 


0.64 


4.72 


- 


2.64 


- 


- 


+ locality 


9.66 


- 


- 


- 


. 


- 


1.43 


- 


- 


- 


- 


- 


0.00 


- 


- 


3.95 


- 


- 



For every class of graphs, we have created 50 instances, which we have solved 
by both the greedy spanner algorithm and our exact method with stretch factor 
t = 1.1, 2.0, 4.0. The tests have been carried out on a 933 MHz Intel Pentium III 
computer, allowing each instance 30 minutes of CPU time. 

In Tables 1 and 2 we show the greedy spanner’s average excess from optimum 
(in percent). Note the slightly different layout of these two tables, in particular 
with respect to the number of nodes. For some instances we have compared the 
greedy spanner with a lower bound on the minimum weight spanner, since we 
did not obtain the optimal value within the CPU time limit. This means that the 
true average excess from optimum may be a bit lower than stated in the table; 
the results for these test classes are written in italics. Some of the problem classes 
have not been solved since the exact method was too time consuming. 

The greedy algorithm performs significantly worse on Euclidean graphs than 
on realistic graphs. Also, the performance of the greedy algorithm decreases as 
the stretch factor increases. Interestingly, the greedy spanner does not necessarily 
become worse for larger problem instances; for almost all test classes considered, 
the average excess from optimum decreases as the problem instance increases. 
When local edges are favored in the realistic problem instances, the quality of 
the greedy spanner decreases slightly for unit and Euclidean costs. 
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The running times depend on the size of the problem, the edge cost function 
and the stretch factor. Naturally, the spanner problem becomes harder to solve 
for larger problem instances. It turns out that spanner problems with unit edge 
costs are considerably harder to solve by our exact method than spanner prob- 
lems with Euclidean or random costs. The difficulties for problem instances with 
unit cost are caused by the large number of paths of equal cost, which makes it 
hard for the branch and bound algorithm to choose the best path. Thus, we need 
to evaluate more branch and bound nodes to solve unit cost problem instances. 

We experienced that our solution times increased when the stretch factor 
increased. This is because the set of feasible paths between any two nodes in- 
creases, potentially increasing the number of columns in our master problem. 
On 75% of the test instances the lower bound provided in the root node was 
optimal. On the remaining 25% of the test instances, the lower bound in the 
root node was 6.4% from the optimum on average. This shows the quality of the 
proposed integer programming formulation. 

5 Conclusions 

In this paper we presented a first exact algorithm for the MWSP. Experiments 
on a set of realistic problem instances from applications in message distribution 
in networks were reported. Problem instances with up to 64 nodes have been 
solved to optimality. The results show that the total weight of a spanner con- 
structed by the greedy spanner algorithm is typically within a few percent from 
optimum for the set of realistic problem instances considered. Thus the greedy 
spanner algorithm appears to be an excellent choice for constructing approxi- 
mate minimum- weight spanners in practice. 

5.1 Generalizations of the MWSP 

The minimum-weight spanner problem may be seen as a special case of a larger 
class of network problems with shortest path constraints. This class of problems 
is normally hard to model due to the difficult shortest path constraints. 

The exact method we present in this paper can easily be modified to solve this 
larger class of network problems with shortest path constraints. Here we describe 
how our method can be applied to several generalizations of the MWSP. 

MWSP with variable stretch factor. Consider the MWSP as defined 
above, but with variable stretch factor for every pair of nodes. This is a 
generalization of the MWSP. Our method can easily be modified to solve 
this problem, since we already generate different variables for every pair of 
nodes satisfying the maximum cost constraint. Thus, it is possible to allow 
different cost constraints for every pair of nodes. 

MWSP with selected shortest path constraints. Consider the MWSP 
where we do not have maximum length constraints on all pairs of nodes. 
The solution to this problem may be a disconnected graph consisting of sev- 
eral smaller spanners. This generalization can also be used to model problems 
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where there are maximum cost constraints between a root node and all other 
nodes [18]. Our method can be modified to solve this problem by removing 
the covering constraints (1.3) for pairs of nodes (u, v) which have no shortest 
path constraints imposed. 



5.2 Future Work 

The main focus in this paper has been to evaluate the quality of the greedy algo- 
rithm, and to present a framework for a promising exact algorithm. It would be 
interesting to evaluate our algorithm on a wider set of problem instances, and to 
examine the practical performance of other heuristic spanner algorithms, includ- 
ing variants of the greedy algorithm (e.g., in which simple local improvements are 
made in the greedy spanner), 0-graphs and well-separated pair decompositions. 

Clearly, there is still room for improvements on the exact algorithm. Provid- 
ing our exact method with a good upper bound computed by, e.g., the greedy 
algorithm will probably help to prune the branch and bound tree. By adding 
more variables in each column generation iteration the procedure may be ac- 
celerated. Finally, column generation stabilization [24] will most likely have an 
accelerating effect. 
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Abstract. We propose the linear axis, a new skeleton for polygonal 
shapes. It is related to the medial axis and the straight skeleton, being 
the result of a wavefront propagation process. The wavefront is linear and 
propagates by translating edges at constant speed. The initial wavefront 
is an altered version of the original polygon: zero-length edges are added 
at reflex vertices. The linear axis is a subset of the straight skeleton of the 
altered polygon. In this way, the counter-intuitive effects in the straight 
skeleton caused by sharp reflex vertices are alleviated. We introduce the 
notion of e-equivalence between two skeletons, and give an algorithm 
that computes the number of zero-length edges for each reflex vertex 
which yield a linear axis e-equivalent to the medial axis. This linear axis 
and thus the straight skeleton can be computed from the medial axis in 
linear time for polygons with a constant number of “nearly co-circular” 
sites. All previous algorithms for straight skeleton computation are sub- 
quadratic. 



1 Introduction 

Skeletons have long been recognized as an important tool in computer graphics, 
computer vision and medical imaging. The most known and widely used skeleton 
is the medial axis [1]. Variations of it include smoothed local symmetries [2] and 
the process inferring symmetric axis [3]. One way to define the medial axis of a 
polygon P is as the locus of centers of maximally inscribed disks. Another way to 
describe it is as the locus of singularities of a propagating wavefront, whose points 
move at constant, equal speed. The medial axis is also a subset of the Voronoi 
diagram whose sites are the line segments and the reflex vertices of P. The medial 
axis has always been regarded as an attractive shape descriptor. However, the 
presence of parabolic arcs may constitute a disadvantage for the medial axis. 
Various linear approximations were therefore proposed. One approach is to use 
the Voronoi diagram of a sampled set of boundary points [4] . 

The straight skeleton is defined only for polygonal figures and was introduced 
by Aichholzer and Aurenhammer [5]. As its name suggests, it is composed only 
of line segments. Also the straight skeleton has a lower combinatorial complexity 
than the medial axis. Like the medial axis, the straight skeleton is defined by a 
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Fig. 1. A decomposition [7] based on split events of the straight line skeleton gives 
counter-intuitive results if the polygon contains sharp reflex vertices. 

wavefront propagation process. In this process, edges of the polygon move inward 
at a fixed rate. 

In [7] a polygon decomposition based on the straight skeleton was presented. 
The results obtained from the implementation indicated that this technique pro- 
vides plausible decompositions for a variety of shapes. However, sharp reflex an- 
gles have a big impact on the form of the straight skeleton, which in turn has a 
large effect on the decomposition. Figure 1 shows three examples of the decom- 
position based on wavefront split events. It clearly shows that the sharp reflex 
vertices cause a counterintuitive decomposition. The reason why sharp reflex 
vertices have a big impact on the form of the straight skeleton, is that points in 
the defining wavefront close to such reflex vertices move much faster than other 
wavefront points. In contrast, points in the wavefront defining the medial axis 
move at equal, constant speed (but this leads to the presence of parabolic arcs) . 



1.1 Contribution 

(i) This paper presents the linear axis, a new skeleton for polygons. It is based 
on a linear propagation, like in the straight skeleton. The speed of the wave- 
front points originating from a reflex vertex is decreased by inserting in the 
initial wavefront at each reflex vertex a number of zero-length edges, called hid- 
den edges. In the propagation, wavefront edges translate at constant speed in 
a self-parallel manner. The linear axis is the trace of the convex vertices of the 
propagating wavefront. It is therefore a subset of the straight skeleton of an 
altered version of the polygon. 

(ii) Properties of the linear axis are given in section 3. We compute upper 
bounds on the speed of the points in the propagating wavefront, that also allows 
us to identify localization constraints for the skeletal edges, in terms of parabola, 
hyperbola and Apollonius circles. These properties also prove that the larger the 
number of hidden edges the better the linear axis approximates the medial axis. 

(iii) A thorough analysis of the relation between the number of the inserted 
hidden edges and the quality of this approximation is given in section 4. We 
introduce the notion of s-equivalenee between the two skeletons. Nodes in the 
two skeletons are clustered based on a proximity criterion, and the e-equivalence 
between the two skeletons is defined as an isomorphism between the resulting 
graphs with clusters as vertices. This allows us to compare skeletons based on 
their main topological structure, ignoring local detail. We next give an algorithm 
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for computing a number of hidden edges for each reflex vertex such that the 
resulting linear axis is e-equivalent to the medial axis. 

(iv) This linear axis can then be computed from the medial axis. The whole 
linear axis computation takes linear time for polygons with a constant number 
of nodes in any cluster. There is only a limited category of polygons not having 
this property. 

(v) Experimental verification shows that in practice only a few hidden edges 
are necessary to yield a linear axis that is e-equivalent to the medial axis. Also 
the resulting decomposition is much more intuitive than the decomposition based 
on the straight skeleton. 

2 Skeletons from Propagation 

Let P be a simple, closed polygon. The medial axis M{P) is closely related to the 
Voronoi diagram VD{P) of the polygon P. The set of sites defining VD{P) is 
the set of line segments and reflex vertices of P. The diagram VD{P) partitions 
the interior of P into Voronoi cells, which are regions with one closest site. The 
set of points U{d) inside P, having some fixed distance d to the polygon is 
called a uniform wavefront. If S is an arbitrary site of P, let Us(d) denote the 
set of points in U{d) closest to S. We call Us(d) the (uniform) offset of S. The 
uniform wavefront is composed of straight line segments (offsets of line segments) 
and circular arcs (offsets of reflex vertices) (see figure 2(a)). As the distance 
d increases, the wavefront points move at equal, constant velocity along the 
normal direction. This process is called uniform wavefront propagation. During 
the propagation, the breakpoints between adjacent offsets trace the Voronoi 
diagram VD{P). The medial axis M{P) is a subset of VD{P); the Voronoi 
edges incident to the reflex vertices are not part of the medial axis. 

The region VC{S) swept in the propagation by the offset of site S is the 
Voronoi cell of S. It is also the set of points inside P whose closest site is S; 
the distance d{x, S) of a point x to 5 is the length of the shortest path inside P 
from X to S. There are two types of edges in M{P): line segments (separating 
the Voronoi cells of two line segments or two reflex vertices) and parabolic arcs 
(separating the Voronoi cells of a line segment and a reflex vertex). A branching 
node b € M{P) is a node of degree at least three, and the sites whose Voronoi 
cells intersect in b are referred to as the generator sites of b. The medial axis 
contains also nodes of degree two, which will be referred to as the regular nodes. 
They are breakpoints between parabolic arcs and line segments. 

The straight skeleton is also defined as the trace of adjacent components of a 
propagating wavefront. The wavefront is linear and is obtained by translating the 
edges of the polygon in a self-parallel manner, keeping sharp corners at reflex 
vertices. In contrast to the medial axis, edges incident to a reflex vertex will 
grow in length (see figure 2(b)). This also means that points in the wavefront 
move at different speeds: wavefront points originating from reflex vertices move 
faster than points originating from the edges incident to those vertices. In fact, 
the speed of the wavefront points originating from a reflex vertex gets arbitrary 
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Fig. 2. The medial axis (a), the straight skeleton (b) and the linear axis in the case 
when one hidden edge is inserted at each reflex vertex (c). Instances of the propagating 
wavefront generating the skeletons are drawn with dotted line style in all cases. In (a) 
the Voronoi edges that are not part of the medial axis are in dashed line style. In (b) 
the dashed lines are the bisectors that are not part of the linear axis. 



large when the internal angle of the vertex gets arbitrary close to 27 t. Wavefront 
vertices move on angular bisectors of wavefront edges. The straight skeleton 
SS{P) of P is the trace in the propagation of the vertices of the wavefront. 

3 Linear Axis — Definition and Properties 

In this section we define the linear axis. It is based on a linear wavefront propa- 
gation like the straight skeleton, but the discrepancy in the speed of the points in 
the propagating wavefront, though never zero, can decrease as much as wanted. 
Also, as it turns out from lemma 2, the speed of the points of this wavefront is 
bounded from above by \/2. 

Let {ui, U 2 , . . . , Vn} denote the vertices of a simple polygon P, and let k = 
(fci, ^ 2 , . . . , be a sequence of natural numbers. If Vi is a convex vertex of P, 
ki = 0, and if it is a reflex vertex, ki > 0. Let T^'^(O) be the polygon obtained from 
P by replacing each reflex vertex Vi with ki + 1 identical vertices, the endpoints 
of ki zero-length edges, which will be referred to as the hidden edges associated 
with Vi . The directions of the hidden edges are chosen such that the reflex vertex 
Vi of P is replaced in V^{0) hy ki + 1 “reflex vertices” of equal internal angle. 
Let V^{t) denote the linear wavefront, corresponding to a sequence k of hidden 
edges, at moment t. The propagation process consists of translating edges at 
constant unit speed, in a self-parallel manner, i.e. it is the propagation of the 
wavefront defining the straight skeleton of V^{0). 

Definition 1. The linear axis L'^(P) of P, corresponding to a sequence k of 
hidden edges, is the trace of the convex vertices of the linear wavefront in the 
above propagation process. 

Obviously, L'^{P) is a subset of SS{V^{0))-, we only have to remove the 
bisectors traced by the reflex vertices of the wavefront (see figure 2 (c)). For the 
rest of this paper, we will assume that each selection k of hidden edges that is 
considered, satisfies the condition in the following lemma. 
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Fig. 3. (a) The linear offset (solid line style) of a reflex vertex with 3 associated hidden 
edges is made of 5 segments tangent to the uniform offset (dotted line style) of this 
vertex, (b) The linear offsets of reflex vertex Vj and edge e trace, in the propagation, 
a polygonal chain that lies in the region between parabola P{vj , e) and hyperbola 
{vj , e) . (c) The linear offsets of the reflex vertices Vi and Vj trace a polygonal chain 
that lies outside the Apollonius circles (vj,Vi) and C“'{vi,Vj) 



Lemma 1. If any reflex vertex Vj of internal angle aj > 3 tt/ 2 has at least one 
associated hidden edge, then L'^{P) is an acyclic connected graph. 

A site of P is a line segment or a reflex vertex of the polygon. If S is an 
arbitrary site of P, we denote by Pg{t) the points in V^{t) originating from S. 
We call Vg{t) the (linear) offset of site S at moment t. Figure 3(a) illustrates 
the linear and the uniform offsets of a reflex vertex Vj, with kj = 3 associated 
hidden edges. 

Individual points in move at different speeds. The fastest moving points 
in are its reflex vertices. The slowest moving points have unit speed (it also 
means that we assume unit speed for the points in the uniform wavefront). The 
next lemma gives bounds on the speed of the points in the linear wavefront P'' . 
Let Vj be a reflex vertex of P, of internal angle aj, and with kj associated hidden 
edges. Let Pf. (t) be the offset of Vj at some moment t. 

Lemma 2. Points in Vfi.it) move at speed at most 



1 




The linear axis L’^{P) is the trace of the convex vertices of the propagating 
linear wavefront. Each convex vertex of V^ff) is a breakpoint between two linear 
offsets. Lemmas 3 and 4 describe the trace of the intersection of two adjacent 
offsets in the linear wavefront propagation. They are central to the algorithms 
in section 4. First, we establish the needed terminology. If is a vertex and e 
is an edge non-incident with v, we denote by P(f,e) the parabola with focus 
V and directrix the line supporting e. For any real value r > 1, we denote by 
H^{v, e) the locus of points whose distances to v and the line supporting e are in 
constant ratio r. This locus is a hyperbola branch. If u and v are reflex vertices, 
we denote by C'"{u,v) the Apollonius circle associated with u, and v, and ratio 
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r ^ 1. C^{u,v) is the locus of points whose distances to u and v, respectively, 
are in constant ratio r. 

Let e be an edge and Vj be a reflex vertex of P (see figure 3 (b)). The points 
in the offset move at unit speed, the points in the offset P!^. move at speeds 
varying in the interval [1, s^], where Sj is given by lemma 2. 

Lemma 3. If the linear offsets of vj and e beeome adjacent, the trace of their 
intersection is a polygonal chain that satisfies: 

1) it lies in the region between P(vj,e) and (vj,e); 

2) the lines supporting its segments are tangent to P{vj,e); the tangent points 
lie on the traces of the unit speed points in Vf. ; 

3) passes through those vertices of the chain which lie on the trace of 
a reflex vertex of Vf. . 

Let Vi and Vj be reflex vertices of P (see figure 3 (c)). The points in the offset 
Vf. move at speeds varying in the interval [l,Si], while the points in the offset 
move at speeds varying in the interval [1, s^]. 

Lemma 4. If the linear offsets of Vi and Vj become adjacent, the trace of 
their intersection is a polygonal chain that lies outside the Apollonius circles 
C’^'{vi,Vj) and C^^{vj,Vi). 

4 Linear Axis, Topologically Similar to the Medial Axis 

Obviously, the larger the number of hidden edges associated with the reflex 
vertices, the closer the corresponding linear axis approximates the medial axis. 
For many applications of the medial axis, an approximation that preserves the 
main visual cues of the shape, though perhaps non-isomorphic to the medial 
axis, is perfectly adequate. The way we now define the e-equivalence between 
the medial axis and the linear axis, will allow us to compute a linear axis that 
closely approximates the medial axis using only a small number of hidden edges. 

Let e > 0; an e-edge is a Voronoi edge generated by four almost co-circular 
sites (see figure 4 (a)). Let bibj be a Voronoi edge, with bi generated by sites Sk, 
Si, Si, and bj generated by S^, Sj, Si. 

Definition 2. The edge bibj is an e-edge if d{bi,Sj) < (1 -I- e)d{bi,Si) or 
d{bj,Si) < {I + e)d{bj, Sj). 

A path between two nodes of M{P) is an e-path if it is made only of e-edges. 
For any node b of M{P), a node b' such that the path between b and b' is an 
e-path, is called an e-neighbour of b. Let N^{b) be the set of e-neighbours of b. 
The set {6} U N^{b) is called an e-cluster. 

Let (Vm,Em) be the graph induced by M{P) on the set of vertices Vm 
composed of the convex vertices of P and the nodes of degree 3 of M{P). Let 
{Vl'‘,Elk) be the graph induced by L^{P) on the set of vertices composed 
of the convex vertices of P and the nodes of degree 3 of L'^{P). 
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Definition 3. M{P) and L'^{P) are e-equivalent if there exists a surjection 
f ■ Vm — >■ Vl'* so that: 

f(p) = Pj for all convex p of P; 

a) V bi,bj € Vm with bj ^ Ng;{bi), 3 an arc in Em connecting bi and bj 3 
an arc in El'^ connecting f{b'f) and f{b'j) where 6 ' G {bi} U Ng{bi) and b} G 
{ 6 j} U Ng(bj). 



The following lemma gives a sufficient condition for the e-equivalence of the 
two skeletons. The path between two disjoint Voronoi cells VC{Si) and VC{Sj) 
is the shortest path in M{P) between a point of VC{Si) C\M{P) and a point of 
VC{Sj)C^M{P). 

Lemma 5. If the only sites whose linear offsets become adjacent in the propa- 
gation, are sites whose uniform offsets trace an edge in the medial axis M{P) 
or sites whose path between their Voronoi cells is an e-path, then the linear axis 
and the medial axis are e-equivalent. 



4.1 Computing a Sequence of Zero-Length Edges 

We now describe an algorithm for computing a sequence k of hidden edges such 
that the resulting linear axis is e-equivalent to the medial axis. As lemma 5 
indicates, the linear axis and the medial axis are e-equivalent if only the linear 
offsets of certain sites become adjacent in the propagation. Namely, those with 
adjacent Voronoi cells, and those whose path between their Voronoi cells is an 
e-path. The algorithm handles pairs of sites whose linear offsets must be at any 
moment disjoint in order to ensure the e-equivalence of the two skeletons. These 
are sites with disjoint Voronoi cells and whose path between these Voronoi cells is 
not an e-path. However, we do not have to consider each such pair. The algorithm 
actually handles the pairs of conflicting sites, where two sites Si and Sj (at least 
one being a reflex vertex) are said to be conflicting if the path between their 
Voronoi cells contains exactly one non-£-edge. When handling a pair Si, Sj we 
check and, if necessary, adjust the maximal speeds Si and Sj of the offsets of 
Si and Sj, respectively, so that these offsets remain disjoint in the propagation. 
This is done by looking locally at the configuration of the uniform wavefront 
and using the localization constraints for the edges of the linear axis given by 
lemmas 3 and 4. More insight into this process is provided below. 



Handling conflicting sites. Let Si and Sj be a pair of conflicting sites. Let 
bi G VC {Si) and bj G VC{Sj) denote the endpoints of the path between their 
Voronoi cells. The handling of this pair depends on the composition of the path 
between VC{Si) and VC{Sj), as well as on the type (reflex vertex or segment) 
of Si and Sj . A detailed case analysis of all possible combinations is not possible 
due to space limitations. For this reason, we present here only the structure of 
the case analysis and detail the handling only in two of these cases. Each of these 
cases requires a different handling. A complete description can be found in [8]. 
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A. The path between the Voronoi cells VC(Si) and VC(Sj) is made of one 
non-e-edge. Let Sk and Si denote the sites whose uniform offsets trace this edge. 
When handling the pair Si, Sj we look locally at the configuration of the uniform 
wavefront shortly prior to the moment the Voronoi edge bibj starts being traced 
in the propagation. We take into consideration not only the type of the sites Si, 
and Sj, but also the type (regular node or branching node) of bi and bj. 

A.l. Si is reflex vertex and Sj is line segment. Regarding bi and bj we have 
the following cases: 

- bi is a branching node and bj is a branching or a regular node of M{P) 
(see figure 4(b)). The handling of the pair Si, Sj in this case is given by: 

Lemma 6. If the speed of the points in the linear offset of Si is at most 
d(b^- ’si) ’ linear offsets of Si and Sj are at any moment disjoint. 

- bi is a regular node occurring sooner than bj, which is a branching node. 

- is a regular node occurring later than bj , which is a regular or branching 
node. 

A. 2. Si and Sj are reflex vertices. Regarding bi and bj we distinguish the 
following cases: 

- bi and bj are branching points. Let b be the intersection of the perpen- 
dicular bisector Bij of Si and Sj with the edge connecting bi and bj (see 
figure 4, (c)). The handling of the pair Si, Sj in this case is indicated by: 

Lemma 7. If the speed of the points in the linear offset of Si is at most 
d(b’s‘k] ’ speed of the points in the linear offset of Sj is at most 

, the linear offsets of Si and Sj are at any moment disjoint. 

- bi is a regular node occurring sooner than bj, which is a branching node. 

- is a regular node occurring later than bj, which is a branching node. 

- bi and bj are regular nodes. 

B. The path between the Voronoi cells VC{Si) and VC{Sj) is made of at least 
two edges, all except one of them being e-edges. A similar case description as 
above is omitted due to space limitations. 



Algorithm. The algorithm for computing a sequence k of hidden edges can 
now be summarized as follows. 

Algorithm ComputeHiddenEdges (P, e) 

Input A simple polygon P and a constant e. 

Output The number of hidden edges for each reflex vertex such that the resulting 
linear axis is e-equivalent to the medial axis. 

1. Compute the medial axis M of P. 

2. For each reflex vertex Sj of P 

if aj > Stt/2 then Sj ^ 1/ cos( °'’^^ ) 
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else Sj 1/ 

3. ComputeConflictingSites(£:) 

4. For each pair of conflicting sites Si, Sj 

kj ^ [(oj — 7r)/(2cos“^(l/sj))] . 

In step 1, for each reflex vertex Sj we initialize the maximal speed Sj of the 
reflex vertices in its linear offset. If the angle aj of Sj is greater than ^7^12, 
we associate one hidden edge with Vj , otherwise the initial number kj of hidden 
edges associated to Sj is 0 (see section 3). The value of Sj is then given by lemma 
2. In handling a pair of conflicting sites Si, Sj in step 3, we use the bounds given 
by lemmas similar to lemma 6 and 7. After identifying in which of the cases 
A.l.l-B the conflicting pair falls into, we check whether the current speeds Si 
and/or Sj satisfy the condition(s) in the corresponding lemma. If they do not, 
we adjust Si and/or Sj to the bound(s) given by the lemma. Finally in step 4, the 
number kj of hidden edges associated with each reflex vertex Sj is determined. 
It is the smallest number of hidden edges such that the speed of the vertices in 
the linear offset of Sj is at most Sj. 

Regarding the computation of the conflicting sites in step 2, we notice that 
each edge of M{P), non-incident to P, is a path between two Voronoi cells. Thus 
each non-£-edge of M{P), non-incident to P, defines two possibly conflicting 
sites. What determines whether these are indeed conflicting is the presence of 
at least one reflex vertex in the pair. If at least one endpoint of the non-e-edge 
is part of an e-cluster, there may be other conflicting sites whose path between 
their Voronoi cells contains this edge. Their number depends on the number of 
nodes in the e-clusters of the endpoints of the non-e-edge. 

Theorem 1. Algorithm ComputeHiddenEdges computes a sequence of hidden 
edges that leads to a linear axis e-equivalent to the medial axis. 

The performance of the algorithm ComputeHiddenEdges depends on the 
number of conflicting pairs computed in step 2. This in turn depends on the 
number of nodes in the e-clusters of M{P). If any e-cluster of M{P) has a 



Fig. 4. (a) An e-edge bibj is a Voronoi edge generated by four almost co-circular sites. 
(b)-(c) Handling the conflicting sites Si and Sj when the path between their Voronoi 
cells consists of one edge. An instance of the uniform wavefront is drawn in dotted 
line style, the medial axis in solid line style, and the dashed lines give the localization 
constraints for the edges in the linear axis. 




(a) 



(b) 



(c) 
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constant number of nodes, there are only a linear number of conflicting pairs, 
since there is a linear number of non-e-edges in M{P). Each conflicting pair is 
handled in constant time, thus in this case, ComputeHiddenEdges computes the 
sequence k in linear time. There is only a limited class of shapes with a constant 
number of clusters, each with a linear number of nodes. 

4.2 Computation of the Linear Axis 

Once we have a sequence k that ensures the e-equivalence between the corre- 
sponding linear axis and the medial axis, we can construct this linear axis. The 
medial axis can be computed in linear time [9] . Despite its similarity to the me- 
dial axis, the fastest known algorithms for the straight skeleton computation are 
slower. The first sub-quadratic algorithm was proposed by Eppstein and Erickson 
[10]. It runs in + 77,s/ii+e^9/in-e^ time with a similar space complexity, 

where r is the number of reflex vertices and e is an arbitrarily small positive 
constant. A more recent algorithm by Cheng and Vigneron [11] computes the 
straight skeleton of a non-degenerate simple polygon in 0(n log^ n + Vy/rlogr) 
expected time. For a degenerate simple polygon, its expected time bound is 
0(n log^n -I- Any of these algorithms can be used to compute the 

straight skeleton of where P”(0) is the polygon obtained from P by 

inserting kj zero- length edges at each reflex vertex Vj. The linear axis L'^(P) 
corresponding to the sequence k is then obtained from SS{V^{0)) by removing 
the bisectors incident to the reflex vertices of P. 

However, \i M{P) has only e-clusters of constant size, L'^(P) can be computed 
from the medial axis in linear time by adjusting the medial axis. In computing 
the linear axis, we adjust each non-e-edge of the medial axis to its counterpart 
in the linear axis. When adjusting an edge bibj we first adjust the location of its 
endpoints to the location of the endpoints of its counterpart. If node 6^ is part of 
an e-cluster, we compute first the counterparts of the nodes in this cluster based 
on a local reconstruction of the linear wavefront. The adjustment of a node’s 
location is done in constant time, if its e-cluster has constant size. Finally, we 
use lemmas 3 and 4 to replace the parabolic arc or the perpendicular bisector 
with the corresponding chain of segments. We can now conclude that: 

Theorem 2. For a polygon with e-clusters of constant size only, a linear axis 
e-equivalent to the medial axis can he computed in linear time. 

5 Examples 

We have implemented the algorithm ComputeHiddenEdges of section 4.1 and 
the algorithm that constructs the linear axis from the medial axis described in 
section 4.2. Figure 5 illustrates the straight skeleton (left column), medial axis 
(middle column) and the linear axis (right column) of the contours of a butter- 
fly, a dog and a ray. The contours come from the MPEG-7 Core Experiment 
test set “CE-Shape-1”, which contains images of white objects on a black back- 
ground. The outer closed contour of the object in the image was extracted. In 
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Fig. 5. A comparison of the linear axis (right) with the medial axis (middle) and the 
straight skeleton (left). The dashed lines in the middle column are those Voronoi edges 
which are not part of the medial axis. The dashed lines in the right column represent 
the bisectors traced by the reflex vertices of the wavefront, which are not part of the 
linear axis. In these examples, the linear axis is isomorphic with the medial axis (e = 0). 



this contour, each pixel corresponds to a vertex. The number of vertices were 
then decreased by applying the Douglas-Peucker [12] polygon approximation al- 
gorithm. For the medial axis computation we used the AVD LEDA package [13], 
which implements the construction of abstract Voronoi diagrams. The straight 
skeleton implementation was based on the straightforward algorithm in [14]. 

For the linear axes in figure 5, the number of hidden edges associated with 
a reflex vertex is indicated by the number of dashed-line bisectors incident to 
the reflex vertices that are not part of the linear axis. The difference between 
the number of incident bisectors and the number of hidden edges is one. We see 
that a very small number of hidden edges gives a linear axis e-equivalent to the 
medial axis. The counter-intuitive results of the straight skeleton computation 
for the butterfly and dog are caused by sharp reflex vertices. Only two hidden 
edges for the reflex vertex between the dog’s front legs and for the sharp reflex 
vertex in the butterfly, are sufficient to get a linear axis e-equivalent to the 
medial axis. Though the reflex vertices of the ray contour are not very sharp, its 
straight skeleton indicates why this skeleton is unsuitable as shape descriptor. 
Two hidden edges associated to each reflex vertex at the body-tail junction, give 
a linear axis e-equivalent to the medial axis. The largest number of hidden edges 
in these examples is three, for the reflex vertex between the dog’s hind legs. 
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Fig. 6. Decompositions based on split events of the linear axis gives natural results 
even if the polygon contains sharp reflex vertices. 



Figure 6 shows the results of the decomposition of the same contours as in 
figure 1, but this time based on the split events in the linear axis. We see that 
the unwanted effects of the sharp reflex vertices are eliminated and the results 
of this decomposition look more natural. 

6 Concluding Remarks 

The insertion of zero-length edges at reflex vertices decreases the speed of the 
wavefront points, which remedies the counter-intuitive effect of sharp reflex ver- 
tices in the straight skeleton. Another way to handle sharp reflex vertices could 
be to allow edges to translate in a non-parallel manner such that the variations 
in the speed of the wavefronts points are small along the wavefront. It is un- 
clear however how to do this without increasing the complexity of the skeleton, 
or even if it is possible to do this in such a way that the outcome is a linear 
skeleton. Yet another way to take care of sharp reflex vertices is to apply first an 
outwards propagation, until the sharp reflex vertices have disappeared and then 
to propagate the front inwards. It is unclear however when to stop the outwards 
propagation such that relevant reflex vertices are not lost from the wavefront. 
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Abstract. The non-additive shortest path (NASP) problem asks for 
finding an optimal path that minimizes a certain multi-attribute non- 
linear cost function. In this paper, we consider the case of a non-linear 
convex and non-decreasing function on two attributes. We present an effi- 
cient polynomial algorithm for solving a Lagrangian relaxation of NASP. 
We also present an exact algorithm that is based on new heuristics we 
introduce here, and conduct a comparative experimental study with syn- 
thetic and real-world data that demonstrates the quality of our approach. 



1 Introduction 

The shortest path problem is a fundamental problem in network optimization 
encompassing a vast number of applications. Given a digraph with a cost function 
on its edges, the shortest path problem asks for finding a path between two nodes 
s and t of total minimum cost. The problem has been heavily studied under the 
additivity assumption stating that the cost of a path is the sum of the costs of its 
edges. In certain applications, however, the cost function may not be additive 
along paths and may not depend on a single attribute. This gives rise to the 
so-called non-additive shortest path (NASP) problem, which asks for finding an 
optimal path that minimizes a certain multi-attribute non-linear cost function. 

In this work, we are interested in the version of NASP in which every edge 
(and hence every path) is associated with two attributes, say cost and resource, 
and in which the non-linear objective function is convex and non-decreasing. 
This version of NASP has numerous applications in bicriteria network opti- 
mization problems, and especially in transportation networks; for example, it 
is the core subproblem in finding traffic equilibria [4,12]. In such applications, 
a utility cost function for a path p is defined that typically translates travel 
time (and travel cost if necessary) to a common utility cost measure (e.g., 
money) . Experience shows that users of traffic networks value time non- linearly 
[8]: small amounts of time have relatively low value, while large amounts of 
time are very valuable. Gonsequently, the most interesting theoretical models 
for traffic equilibria [4,12] involve minimizing a monotonic non-linear utility 
function of two attributes, travel cost and travel time, that allows a decision 

* This work was partially supported by the 1ST Programme (6th FP) of EC under 
contract No. IST-2002-001907 (integrated project DELIS). 
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maker to find the optimal path both w.r.t. travel cost and time. NASP is related 
to multiobjective optimization, and actually appears to be a core problem 
in this area; e.g., in multi-agent competitive network models [5]. The goal in 
multiobjective optimization problems is to compute a set of solutions (paths in 
the case of NASP) optimizing several cost functions (criteria). These problems 
are typically solved by producing the so-called Pareto curve [11], which is the 
set of all undominated Pareto-optimal solutions or “trade-offs” (the set of 
feasible solutions where the attribute-vector of one solution is not dominated by 
the attribute- vector of another solution). Multiobjective optimization problems 
are usually NP-hard. The difficulty in solving them stems from the fact that the 
Pareto curve is typically exponential in size, and this applies to NASP itself. 
Note that even if a decision maker is armed with the entire set of all undom- 
inated Pareto-optimal solutions, s/he is still left with the problem of which is 
the “best” solution for the application at hand. Consequently, three natural 
approaches to solve multiobjective optimization problems are: (i) optimize 
one objective while bounding the rest (cf. e.g., [2,6,9]); (ii) find approximation 
schemes that generate a polynomial number of solutions which are within a 
small factor in all objectives [11]; (iii) proceed in a normative way and choose 
the “best” solution by introducing a utility function on the attributes involved. 

In this paper, we concentrate on devising an exact algorithm for solving 
NASP as well as on computing a Lagrangian relaxation of the integer non-linear 
programming (INLP) formulation of the problem, which is the first step in find- 
ing the optimal solution. Solving the Lagrangian relaxation of NASP is equivalent 
to computing the “best” path among those that lie on the convex hull of the 
Pareto curve, called extreme paths. Note that these paths may also be exponen- 
tially many and that the best extreme path is not necessarily the optimal [10]. To 
find the best extreme path, we adopt the so-called hull approach, which is a cut- 
ting plane method that has been used in various forms and by various researchers 
since 1966 [3]. Its first explicit use in bicriteria network optimization was made 
in [6] for solving the Resource Constrained Shortest Path (RCSP)^ problem. 
However, only recently a considerable refinement of the method has been made 
as well as its worst-case complexity was carefully analyzed [9] . Variations of the 
hull approach for solving relaxations of NASP, when the non-linear objective 
function is convex and non-decreasing, have previously appeared in [7,10], but 
with very crude time analyses giving bounds proportional to the number of the 
extreme Pareto paths. Regarding the exact solution to NASP, we adopt and ex- 
tend the classical 3-phase approach introduced in [6] for solving RCSP, used later 
in [2], and considerably refined in [9]: (a) compute lower and upper bounds to the 
optimal solution, using the solution of the relaxed problem; (b) prune the graph 
by deleting nodes and edges that cannot belong to an optimal path; (c) close the 
gap between upper and lower bound. The pruning phase was not considered in [6, 
7]. Other approaches for solving NASP, either involved heuristics based on path 
flow formulations and repeatedly solving LPs [4,5], or were based on repeated 
applications of RCSP [12] (one for each possible value of the resource of a path). 



^ RCSP asks for finding a minimum cost path whose resource does not exceed a specific 
bound. Costs and resources are additive along paths. RCSP is modelled as an ILP. 
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In this work, we provide new theoretical and experimental results for solving 
NASP. Our contributions are threefold: (i) We extend the hull algorithm in [9] 
and provide an efficient polynomial time algorithm for solving the Lagrangian re- 
laxation of NASP. Note that the extension of a method that solves the relaxation 
of a linear program (RCSP) to a method that solves the relaxation of a non- 
linear program (NASP) is not straightforward. For integral costs and resources 
in [0, C] and [0, A], respectively, our algorithm runs in 0{\og{nRC){m + n\ogn)) 
time, where n and m are the number of nodes and edges of the graph. To the 
best of our knowledge, this is the first efficient algorithm for solving a relaxation 
of NASP. Our algorithm provides both upper and lower bounds, while the one 
in [10] provides only the former, (ii) We give an exact algorithm to find the 
optimum of NASP based on the aforementioned 3-phase approach. Except for 
the relaxation method, our algorithm differentiates from those in [6,7,9,10] also 
in the following: (a) we provide two new heuristics, the gradient and the on-line 
node reductions heuristics, that can be plugged in the extended hull algorithm 
and considerably speed up the first phase as our experimental study shows, (b) 
We give a method for gap closing that is not based on /c-shortest paths, as those 
in [6,7], but constitutes an extension and improvement of the method for gap 
closing given in [9] for the RCSP problem. Again, our experiments show that 
our method is faster than those in [6,7,9]. (iii) We introduce a generalization 
of both NASP and RCSP, in which one wishes to minimize a non-linear, con- 
vex and non-decreasing function of two attributes such that both attributes do 
not exceed some specific bounds, and show that the above results hold for the 
generalized problem, too. 

Proofs and parts omitted due to space limitations can be found in [13]. 



2 Preliminaries and Problem Formulation 

In the following, let G = {V, E) be a digraph with n nodes and m edges, and let 
c: E ^ IRq be a function associating non-negative costs to the edges of G. The 
cost of an edge e £ E will be denoted by c(e). To better introduce the definition 
of NASP, we start with an Integer Linear Programming (ILP) formulation of the 
well-known additive shortest path problem given in [9] . Let P be the set of paths 
from node s to node t in G. For each path p € P we introduce a 0-1 variable 
Xp, and we denote by Cp, or by c(p), the path’s cost, i.e., Cp = c(p) = J^g^pC(e). 
Then, the ILP for the additive shortest path is as follows. 

min CpXp 
pGP 

s.t. ^ Xp = 1 ; Xp € {0, 1}, Vp G F 

peP 

Suppose now that, in addition to the cost function c, there is also a resource 
function r : A — ?► associating each edge e G E with a non-negative resource 
r(e). We denote the resource of a path p by Xp, or by r{p), and define it as 
Xp = x(p) = X)eGp’"(®)- Consider now an non-decreasing and convex function 



( 1 ) 

(2) 
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U : Mq Mq with first derivative U' : WIq — ?► Mq . Function ?7 is a translation 
function that converts the resource of a path to cost. Now, the shortest path 
problem asks for finding a path p G P which minimizes the overall cost, that is: 

min 'Y2cpXp + U{'^rpXp) (3) 

pSP pGP 

Substituting (1) with the new objective function (3) and keeping the same con- 
straints (2), we get an Integer Non-Linear Program (INLP). Since the overall 
cost of a path is no longer additive on its edges the problem is called the non- 
additive shortest path problem (NASP). It is easy to see that the solution to 
NASP is a Pareto-optimal path. In the full generality of the problem, there is 
also a translation function for cost (cf. Section 5). We start from this simpler 
objective function, since it is the most commonly used in applications (see e.g., 
[4,5,12]), and it better introduces our approach. 



3 Lagrangian Relaxation and Its Solution 



We can transform the above INLP into an equivalent one by introducing a vari- 
able z and setting it equal to z = Thus, the INLP objective function 

becomes min c-p^p + while an additional equality constraint is in- 

troduced: '^p^p'f’pXp = z. Since U{-) is a non-decreasing function, we can relax 
this equality constraint to inequality, and get the following INLP: 

min CpXp U (z) 

peP 

s.t. ^ VpXp < z (4) 

pGP 

^a;p = l ; Xp € {0, 1}, Vp € P 

pGP 

We can relax the above using Lagrangian relaxation: introduce a Lagrangian mul- 
tiplier p > 0, multiply the two sides of (4) by p, and turn constraint (4) in the 
objective function. The objective now becomes L{p) = min^^gp (cp -I- prp)xp-\- 
U{z) — pz, while constraints (2) remain unchanged. L(p) is a lower bound on 
the optimal value of the objective function (3) for any value of the Lagrangian 
multiplier p [1, Ch. 16]. A best lower bound can be found by solving the problem 
maxp>o L{p) (equivalent to solving the NLP relaxation of NASP, obtained by 
relaxing the integrality constraints). Since none of the constraints in L{p) con- 
tains variable z, the problem decomposes in two separate problems Lp{p) and 
Lz{p) such that 



L{p) = Lp{p) -H Lz{p), 
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where ^p(m) = min E (Cp + flTp)Xp 

pG-P 

s.t. ^ Xp = 1 ; Xp € {0, 1}, Vp G P 

pGP 

and ^z(m) = min U{z) — pLZ (5) 

Observe now that for a fixed /i: (i) Lp(fj.) is a standard additive shortest path 
problem w.r.t. the composite (non-negative) edge weights w{-) = c(-) -I- 
and can be solved using Dijkstra’s shortest path algorithm in 0{m + nlogn) 
time; (ii) is a standard minimization problem of a (single variable) convex 

function. These crucial observations along with some technical lemmata will 
allow us to provide an efficient polynomial algorithm for solving the relaxed 
version (Lagrangian relaxation) of the above INLP. 

3.1 The Ingredients 

In the following, let p = SP{s,t, c,r, p) denote a shortest path routine that 
w.l.o.g. always returns the same s-t path p that is shortest w.r.t. edge weights 
w(e) = c(e) -I- pr(e), Ve G E, and let argmin{/( 2 :)} denote the maximum z that 
minimizes f{z). The next lemma relates resources and costs of shortest paths 
w.r.t. different p. 

Lemma 1. Letpi = SP{s,t,c,r, pi), ph = SP{s,t,c,r, ph), zi = argmin{C/( 2 ;) - 
piz}, Zh = argmin{[/( 2 ;) - phz}, and Q < pi < ph- Then, (i) r{pi) > r{ph); (ii) 
c{pi) < c{ph); (Hi) r{pi)-zi > r{ph)-Zh; and (iv) c{pi) + U{zi) < c{ph) + U{zh). 

Lemma 1 allows us to use any line-search method to either find an optimal 
solution or a lower bound to the problem. Here, we follow the framework of 
the hull approach and in particular of its refinement in [9] for solving the RCSP 
problem (hull algorithm), and extend it to NASP. The approach can be viewed as 
performing a kind of binary search between a feasible and an infeasible solution 
w.r.t. constraint (4). In each step the goal is to find a pair {p,z) that either 
corresponds to a new feasible solution that reduces the objective value (c(p) -I- 
U{z)), or corresponds to an infeasible solution that decreases the amount (r{p) — 
z) by which constraint (4) is violated. 

The next two lemmata allow the effective extension of the hull algorithm to 
NASP. The first lemma shows how such a solution can be found by suitably 
choosing the Lagrangian multiplier at each iteration. 

Lemma 2. Letpi = SP{s,t,c,r, pi), ph = SP{s,t,c,r, ph), zi = argmin{C/(z)- 
piz), Zh = argmin{C/(z) - phz}, Pm = SP{s,t,c,r, pm), Zm = argmin{[/(z) - 
Pmz}, 0 < Pi < Ph, and pm = r^lripj) ' Then, (i) r(pi) - zi > r(pm) ~ Zm > 
r(ph) - Zh, and (ii) c(pi) + U{zi) < c{pm) + U{zm) < c(ph) + U{zh). 

The previous lemma can be usefully applied when we have an efficient way to 
test for feasibility. Compared to RCSP, where this test was trivial (just checking 
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whether the resource of a path is smaller than a given bound), checking whether 
{p, z) is feasible reduces in finding the inverse of U'{z) (as objective (5) dictates) 
and this may not be trivial. The next lemma allows us not only to efficiently 
test for feasibility, but also to avoid finding the inverse of U'{z). 

Lemma 3. Let /i > 0 with corresponding path p = SP{s, t, c, r, p), and let z* = 
argmin{C/(z) — pz}. Then, {p, z*) is a feasible solution w.r.t. constraint (4) if 
U'{r{p)) < p. 



3.2 The Extended Hull Algorithm 

We are now ready to give the algorithm for solving the relaxed INLP. Call a 
multiplier p feasible if p leads to a feasible solution pair (p, z); otherwise, call p 
infeasible. During the process we maintain two multipliers pi and ph as well as 
the corresponding paths pi and ph- Multiplier pi is an infeasible one, while ph 
is feasible. Initially, pi = 0 and the corresponding path pi is the minimum cost 
path, while ph = +oo and the corresponding path ph is the minimum resource 
path. We adopt the geometric interpretation introduced in [9] and represent each 
path as a point in the r-c-plane. 

In each iteration of the algorithm we find a path corresponding to the 
multiplier This is equivalent to moving the line connecting p^ 

and Pi with slope —pm along its normal direction until we hit an extreme point 
in that direction (see Fig. 1(a)). Multiplier pm leads to a solution {pm,Zm)- We 
check whether {prm Zm) is feasible using Lemma 3. This is equivalent to check 
whether U'{r{pm)) — Pm is positive, negative, or zero. In the latter case, we have 
found the optimal solution. If it is positive, then by Lemma 2 we have to move 
to the “left” by setting pi equal to Pm and pi to pm] otherwise, we have to move 
to the “right” by setting ph equal to Pm and ph to pm ■ 

We iterate this procedure as long as pi < pm < Ph, i-e., as long as we are 
able to find a new path below the line connecting the paths pi and ph- When 
such a path cannot be found, the algorithm stops. As we shall see next, in this 
case either one of pi and ph is optimal, or there is a duality gap. We call the 
above the extended hull algorithm. 




Fig. 1. The algorithm, (a) First iteration, (b) Last iteration. 
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We are now ready to discuss its correctness. The following lemma implies 
that the paths pi and ph maintained during each iteration of the algorithm give 
us upper bounds on the resource and the cost of the optimal path. 

Lemma 4. Let p > 0 be a fixed multiplier and let p = SP{s,t,c,r, p) be its 
corresponding path, (i) If U'(r{p)) > p, then r{p) is an upper bound on the 
resource of the optimal path, (ii) IfU'{r{p)) < p, then c{p) is an upper bound 
on the cost of the optimal path. (Hi) IfU'{r{p)) = p, then p is optimal. 

The next lemma (see Fig. 1(b)) completes the discussion on the correctness. 

Lemma 5. Let 0 < pi < ph be the multipliers at the end of the algorithm, 
Pi = SP{s, t, c, r, pi) and ph = SP{s, t, c, r, ph) be their corresponding paths, and 
let Pm = M V Hi < U'{r{pi)) < Pm, then pi is optimal, (ii) If 

l^m ^ U ^ then Ph IS OptZTTlQjl. If U (^T(^Ph')') < U'{r{pi)), 

then there may be a duality gap. In this case, the optimum lies in the triangle 
defined by the lines r = r{pi), c = c{ph), and the line connecting pi and ph. 

Proof. From Lemma 2, we have that pi < pm = — Hh- We will also 

use the fact (cf. [13]) that if we have found a path which is shortest for both 
multipliers pi and ph, then it is shortest for any p € [pi,ph\. 

(i) If Hi < U'{r{pi)) < Pm, from the above fact we have that Vg S P: 
c{pi) + U'{r{pi))r{pi) < c{q) + U'{r{pi))r{q). Thus, Lemma 4(iii) implies that 
Pi is optimal, (ii) Similar to (i). (iii) Follows directly from Lemma 4 and the 
stopping condition of the algorithm. □ 

In the case where there is a duality gap (see Fig. 1(b)), we can compute a 
lower bound to the actual optimum of the original INLP. Let the line connecting 
Pi and Ph he c + pr = W, where W = c{pi) + pr{pi) = c{ph) + pr{ph)- In 
the best case the optimum path (c, f) lies on that line and therefore it must 
be c + = W. A lower bound to the actual optimum is thus c + U(r), where 

(c,r) = (W — p\U']~^{p), [U']~^{p)) is the solution of the system of equations 
c+ pr = W and U'ir) = p (here [Cl']“^(-) denotes the inverse function of U'{-)). 

We now turn to the running time of the algorithm. The algorithm’s progress 
can be measured by how much the area of the triangular unexplored region 
(defined by points pi, ph, and the intersection of the lines with slopes —pi and 
—ph passing through pi and pt, respectively) reduces at each iteration. We can 
prove (in a way similar to that in [9] ; cf. [13] ) that the area reduces by at least 3/4. 
For integral costs and resources in [0, C] and [0, R], respectively, the area of the 
initial triangle is at most f(ni?)(nC), the area of the minimum triangle is 4, 
and the unexplored region is at least divided by four at each iteration. Thus, the 
algorithm ends after 0(log(ni?C')) iterations. In each iteration a call to Dijkstra’s 
algorithm is made. Hence, we have established the following. 

Theorem 1. The extended hull algorithm solves the Lagrangian relaxation of 
N ASP in 0{log{n RC){m + n log n)) time. 
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4 The Exact Algorithm 

The exact algorithm for solving NASP follows the general outline of the algo- 
rithms in [2,6,9] for solving RCSP and consists of three phases. In the first phase, 
we solve the relaxed problem using the extended hull algorithm. This provides us 
with a lower and an upper bound for the objective value as well as with an upper 
bound on the cost and the resource of the optimal path. In the second phase, 
we use these cost and resource upper bounds to prune the graph by deleting 
nodes and edges that cannot lie on the optimal path (details for this phase can 
be found in [13]). In the third phase, we close the gap between lower and upper 
bound and find the actual optimum by applying a path enumerating process in 
the pruned graph. 

4.1 Improving the Efficiency of the Extended Hull Algorithm 

To achieve better running times for phase 1, we develop two new heuristics that 
can be plugged in the extended hull algorithm. The first heuristic intends to 
reduce the number of iterations, accomplished by identifying cases where we 
can increase the algorithm’s progress by selecting a multiplier other than that 
suggested by the hull approach. We refer to it as the gradient heuristic. The 
second heuristic aims at reducing the problem size by performing node reductions 
on the fly, using information obtained from previous iterations. We refer to it as 
the on-line node reductions heuristic. 



Gradient Heuristic. Assume that there exists a multiplier /i' such that fj,i < 
gd < iifii and for which we can tell a priori whether the corresponding path is 
feasible or not (i.e., without computing it). We can then distinguish two cases 
in which we can examine instead of and achieve a better progress: (a) If 
/r' leads to an infeasible solution and /i; < fim < then Lemma 1 implies that 
gLm also leads to an infeasible solution. Therefore, since both and gc' lead to 
an infeasible path, and gd is closer to gLh than gtrn, we can use /r' instead of fim 
in the update step and thus make a greater progress, (b) Similarly, if leads to 
a feasible solution and /r' < /Xm < Mh, then /i^ also leads to a feasible solution. 
Again, we can use instead of gim in the update step, since it is closer to /xj. 
The following lemma suggests that when we are given a feasible (resp. infeasible) 
multiplier /x, we can always find an infeasible (resp. feasible) multiplier gd . Its 
proof follows by Lemma 1 and the convexity of U{-). 

Lemma 6. Let fi > 0 be a fixed multiplier, p = SP{s,t,c,r, p), p' = U'{r{p)), 
and q = SP{s,t,c,r, p'). (i) If p' < p, then U'{r{q)) > p' . (ii) If p' > p, then 
U'{r{q)) < p'. 

Lemma 6 along with the above discussion lead to the following heuristic 
that can be plugged in the extended hull algorithm. Let 0 < /x; < be the 
multipliers at some iteration of the algorithm, and let pi , ph be the corresponding 
paths. If U'{r{ph)) > r^IriZ] - ^ U'{r{ph)). If U'{r{pi)) < 

r\lo-r(pl] ^ l^h, then set pm = U'{r{pi)). Otherwise, set pm = ■ 
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On-line Node Reductions Heuristic. Further progress to the extended hull 
algorithm can be made by temporarily eliminating at each iteration those nodes 
that cannot lie on any path obtained by any subsequent iteration of the algo- 
rithm. These temporary eliminations of nodes are done in an on-line fashion and 
apply only to phase 1. Let y) denote a lower bound on the cost of a shortest 
x-y path w.r.t. the edge weights w{-) = c(-) -|-/rr(-). Then, we can safely exclude 
from the graph any node u for which any of the following holds: 

(s, u) + {u, t) > c{ph) -H (ph) 

(s, u) + dfj,, {u, t) > c{ph) + pir{ph) 
dp.H {s, u) + df,^ {u, t) > c{pi) + phr{pi) 

The above lower bound distances can be found using the following lemma. 

Lemma 7. Let djj^{u, v) and v) be lower bounds of the eost of a shortest 

u-v path w.r.t. edge weights c(-) +JIir{-) and c(-) +JjLhr{-), respectively, where 
JTi <fLh. Letjl G Then, dji{u,v) = 9djzr{u,v) -l- (1 - 9)d-p;^{u,v) is also 

a lower bound of the cost of a shortest u-v path w.r.t. edge weights c(-) -I- Jj.r{-), 
where 9 = . 

Uh—tn 

To apply the on-line node reductions heuristic, we maintain a set R that 
consists of the nodes reduced until that iteration. Additionally, we maintain the 
lower bound distances d^«(s, u), (s,u), d^t{u,t), d^^(it, t), Vit € V — R, where 

Pi ^ Ph P* ^ Ph Ph ^ Ph, Ph ^ Ph- These distances have been computed in 
previous shortest path computations (initially, they are set to zero). We call u 
reduced either if rt G i?, or if any of the inequalities (6), (7), or (8) is satisfied. 

In each iteration we test a multiplier prn by performing a forward (resp. back- 
ward) run of Dikjstra’s algorithm [13], using w{-) = c(-)-|-/im?'(-) as edge weights. 
For each node u extracted from the priority queue (used in Dijkstra’s algorithm), 
we check whether u is reduced or not. If u is reduced but not in R, then we in- 
sert it to R. To check the inequalities, we need the lower bound distances in the 
LHS of (6), (7), and (8), which can be found by applying Lemma 7 on the known 
distances d^»(s, u), d^*(s,u), d^t{u,t), and d^t^{u,t). If u is reduced, then we 
do not need to examine its outgoing (resp. incoming) edges. We stop Dijkstra’s 
algorithm when the target (resp. source) node is settled. At this point, by setting 
the distances of the non-reduced unsettled nodes to the distance of the target 
(resp. source), we have new lower bound distance information, which can be used 
in subsequent iterations by updating the current information appropriately. 

4.2 Closing the Gap 

To close the gap, we follow a label setting method that is essentially a modifica- 
tion of the one proposed in [9] for the RCSP problem. In the additive shortest 
path problem, Dijkstra’s algorithm keeps a label for each node u representing 
the tentative distance of an s-u path. In our case each path p has a cost and a 
resource, and thus its corresponding label is a tuple (j-p, Cp). Consequently, with 
each node u a set of labels corresponding to s-u paths have to be stored, since now 



( 6 ) 

( 7 ) 

( 8 ) 
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there is no total order among them. However, we can exploit the fact that some 
paths may dominate other paths, where a path p with label (rp, Cp) dominates 
path q with label (r^, Cq) if Cp < Cq and Vp < Vq. Hence, at each node u we main- 
tain a set L(u) of undominated s-u paths, that is, every path in L(u) does not 
dominate any other in the set. Let Cuv (resp. denote the cost (resp. resource) 
of the shortest u-v path w.r.t. cost (resp. resource), and let Cmax (resp. rmax) be 
the upper bound on the cost (resp. resource) of the optimal path returned by 
phase 1. We call an s-u path p of cost Cp and resource Vp non-promising, if any 
of the following holds: (i) Cp + Cut > Cmax] (h) + > Tmax] (hi) if there exists 

a Pareto-optimal u-t path q of cost Cq and resource Vq for which Cp -\- Cq > Cmax 
and Tp -\- Tq > r^nax- Clearly, a non-promising path cannot be extended to an 
optimal one and hence it can be discarded. We would like to mention that in 
[9] only (i) and (ii) are checked. Our experiments revealed, however, that gap 
closing involving (iii) can be considerably faster (cf. Section 6). The gap closing 
algorithm is implemented with the use of a priority queue, whose elements are 
tuples of the form {wp, {vp, Cp)), where Wp = Cp-\- Prafp is the combined cost of p 
and Prn is the final multiplier of phase 1. Paths are examined in the order they 
are hit by sweeping the line connecting pi and ph along its normal direction. 



5 Generalization 

The generalization of the objective function (3) involves two non-decreasing and 
convex functions C4 : IRq G- : JRq translating a path’s cost 

(travel distance) and resource (travel time), respectively, to the common utility 
cost (money). We can also incorporate restrictions that may exist on the total 
cost or the resource of a path, thus adding constraints of the form J^p^p ^ 
Be and X^pgp^p^p — for some constants Bc,Br > 0. Consequently, the 
resulting INLP is as follows 



min Uc ^ CpXp 


+ G ( y]] Xp 


.Xp 


\peP j 


\P6P 


J 


s.t. CpXp < Be 


; XpXp 


< Br 


P&p 


pGP 




ajp = 1 ; 


Xp G {0,1}, 


'ip G P 


P&p 







(9) 



and constitutes a generalization to both NASP and RCSP. We can show that the 
idea with the non-negative composite edge weights w{-) = c(-) -I- pr(-) (based on 
a Lagrangian multiplier /i) provides us with a polynomial algorithm for solving 
the relaxed version of the above INLP. We provide generalized versions of the key 
Lemmata 4 and 5 (see [13]), which actually allow us to apply the hull approach 
and get the desired solution. The generalized versions of these Lemmata result in 
certain modifications of the extended hull algorithm (see [13]), without sacrificing 
its worst-case complexity. 
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6 Experiments 

We conducted a series of experiments based on several implementations^ of the 
extended hull algorithm and of the exact algorithm with the heuristics described 
in Section 4. In particular, for the solution of the Lagrangian relaxation of NASP 
(phase 1 of the exact algorithm), we have implemented the extended hull algo- 
rithm (EHA) of Section 3.2 (using Dijkstra’s algorithm for shortest path com- 
putations and aborting when the target node is found), the variant of the hull 
approach described in [10] using the bidirectional Dijkstra’s algorithm (MW), 
and EHA using different heuristics: EHA[B] (with the bidirectional Dijkstra’s 
algorithm), EHA[G] (with the gradient heuristic), EHA [NR] (with the on-line 
node reductions heuristic), EHA[G-|-NR] (gradient and on-line node reductions 
heuristics), and EHA[G-|-B] (gradient heuristic with bidirectional Dijkstra). 

With this bulk of implementations, we performed experiments on random 
grid graphs and digital elevation models (DEM) - following the experimental 
set-up in [9] - as well as on real-world data based on US road networks. For 
grid graphs, we used integral edge costs and resources chosen randomly and 
independently from [100,200]. A DEM is a grid graph where each node has an 
associated height value. We used integral values as heights chosen randomly 
and independently from [100, 200]. The same applies for the edge resources. The 
cost of an edge was the absolute difference between the heights of its endpoints. 
In both types of input, we used the objective function (3) which is the most 
commonly used in transportation applications [4,5,12] with U{r) quadratic and 
appropriately scaled; that is, the objective function was + ( ^ t) where 

by S\{s,t) we denote the weight of the shortest s-t path with respect to the 
weight function A(-). 

The experimental results on grid graphs are reported in Table 1 (similar 
results hold for DEMs; cf. [13]). We report on time performances (secs) and 
on the number of shortest path computations (^SP). The reported values are 
averages over several query (s, t) pairs that involve border nodes of the grid 
(similar results hold for randomly chosen pairs). 

For the solution of the relaxed problem (phase 1 of the exact algorithm) , we 
observe that MW, despite the fact that it is implemented using bidirectional Di- 
jkstra (which is usually faster than pure Dijkstra), has an inferior performance 
w.r.t. both EHA and EHA[B]. This is due to the fact that it performs much more 
shortest path computations, since MW does not use derivatives. Regarding the 
heuristics, we observe that both gradient and on-line node reductions are rather 
powerful heuristics. In all cases, the gradient heuristic reduces the number of 
shortest path computations from 20% to 60%, and the improvement is more evi- 
dent as the graph size gets larger. The on-line node reductions heuristic achieves 
in almost all cases a better running time than the gradient heuristic. The most 
interesting characteristic of this heuristic is that its performance increases with 
the number of shortest path computations. This is due to the successive re- 
duction of the problem size. Finally, the combination of both heuristics gives 

^ Implementations were in C++ using LEDA. Experiments were carried out in a SunFire 
280R with an UltraSPARC-Ill processor at 750 MHz and 512 MB of memory. 
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Table 1. t^GCP denotes the number of (s, t) pairs that needed pruning and gap closing. 





N xN Grid 


- 100 Border (s, t) pairs 






N 


50 


100 


200 


400 


600 


Implem. 


#SP 


Time 


#SP 


Time 


#SP 


Time 


#SP 


Time 


#SP 


Time 


MW 


9.35 


0.125 


10.9 


0.650 


12.3 


4.862 


14.8 


33.540 


15.5 


91.640 


EHA 


6.74 


0.095 


7.61 


0.490 


8.31 


3.359 


9.68 


22.128 


9.97 


61.627 


EHA[B] 


6.74 


0.088 


7.61 


0.444 


8.31 


3.211 


9.68 


21.648 


9.97 


57.902 


EHA[G] 


4.35 


0.061 


4.14 


0.260 


4.38 


1.721 


4.73 


10.609 


4.62 


27.965 


EHA [NR] 


6.74 


0.063 


7.61 


0.281 


8.31 


1.683 


9.68 


9.6354 


9.97 


25.354 


EHA[G-hNR] 


4.35 


0.053 


4.14 


0.234 


4.38 


1.451 


4.73 


8.3676 


4.62 


21.879 


EHA[G-hB] 


4.35 


0.054 


4.14 


0.233 


4.38 


1.650 


4.73 


10.437 


4.62 


26.290 





N X N Grid - 10000 Border (s, t) pairs 


N 


100 


200 


#GGP 


101 


182 


Phase 


Time 


^deLmin 


Time 


^deLmin 


pruning 


0.267 


- 


2.209 


- 


gc[MZ] 


0.359 


14260.9 


32.420 


744056 


gc[new] 


0.276 


9459.8 


17.440 


456400 



Table 2. Nodes s and t are chosen randomly among all nodes. 



Road Networks 


- 1000 Random (s, t) pairs 


Input Graph 


U.S. NHPN 


Florida 


Alabama 


Implem. 


#SP 


Time 


#SP 


Time 


#SP 


Time 


MW 


10.58 


3.43 


10.19 


2.05 


9.88 


2.45 


EHA 


7.34 


2.44 


7.10 


1.33 


6.88 


2.14 


EHA[B] 


7.34 


2.32 


7.10 


1.45 


6.88 


1.70 


EHA[G] 


4.01 


1.23 


4.01 


0.69 


3.98 


1.15 


EHA [NR] 


7.34 


1.11 


7.10 


0.63 


6.88 


0.97 


EHA[G+NR] 


4.01 


1.04 


4.01 


0.60 


3.98 


0.95 


EHA[G+B] 


4.01 


1.14 


4.01 


0.73 


3.98 


0.89 



Road Networks - 


10000 Random ( 


s, t) pairs 


Graph 


U.S 


NHPN 


Florida 


Alabama 


#GCP 


no 


51 


74 


Phase 


Time 


T^deLmin 


Time 


^deLmin 


Time 


^deLmin 


pruning 


0.92 


- 


0.43 


- 


0.74 


- 


gc[MZ] 


5.64 


201156 


1.46 


37284 


1.34 


32940 


gc[new[ 


4.68 


144848 


0.94 


30889 


1.30 


29620 



further improvements and makes EHA[G-|-NR] along with EHA[G-|-B] to be the 
fastest implementations of the exact algorithm. Regarding pruning and gap clos- 
ing (phases 2 and 3 of the exact algorithm), our experiments revealed that these 
two phases were performed only for a very small number of instances, actually 
less than 2%; see lower part of Table 1. This implies that for the 98% of the 
instances tested, the time bounds of Table 1 (upper part) essentially provide the 
total running time of the exact algorithm. We made two implementations of gap 
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closing: the first is the one described in [9] (gc[MZ]), while the second is our 
method described in Section 4.2 (gc[new]) that uses Pareto path information. 
The larger the graph, the better the performance of gc[new], since much more 
paths are discarded and thus less delete-min operations are executed. 

Regarding the experiments on real road networks from USA, we consider 
the high-detail state road networks of Florida and Alabama, and the NHPN 
(National Highway Planning Network) covering the entire continental US. The 
data sets of the state networks consist of four levels of road networks (interstate 
highways, principal arterial roads, major arterial roads, and rural minor arterial 
roads). The Florida high-detail road network consists of 50109 nodes and 
133134 edges, the Alabama high-detail road network consists of 66082 nodes 
and 185986 edges, and the US NHPN consists of 75417 nodes and 205998 edges. 
These particular data sets were also used in [14]. In all cases the cost and the 
resource of an edge were taken equal to the actual distance of its endpoints 
(provided with the data) multiplied by a uniform random variable in [1, 2] (to 
differentiate the cost from the resource value), and subsequently scaled and 
rounded appropriately so that the resulting numbers are integers in [0,10000]. 
The non-linear objective function considered in all cases was the same as that 
for grid graphs and DEMs. The experimental results (see Table 2) are similar 
to those obtained using grid graphs and DEMs. 
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